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Preface 


Paul Erdés liked to talk about The Book, in which God maintains the perfect 
proofs for mathematical theorems, following the dictum of G. H. Hardy that 
there is no permanent place for ugly mathematics. Erdés also said that you 
need not believe in God but, as a mathematician, you should believe in 
The Book. A few years ago, we suggested to him to write up a first (and 
very modest) approximation to The Book. He was enthusiastic about the 
idea and, characteristically, went to work immediately, filling page after 
page with his suggestions. Our book was supposed to appear in March 
1998 as a present to Erdés’ 85th birthday. With Paul’s unfortunate death 
in the summer of 1996, he is not listed as a co-author. Instead this book is 
dedicated to his memory. 

We have no definition or characterization of what constitutes a proof from 
The Book: all we offer here is the examples that we have selected, hop- 
ing that our readers will share our enthusiasm about brilliant ideas, clever 
insights and wonderful observations. We also hope that our readers will 
enjoy this despite the imperfections of our exposition. The selection is to a 
great extent influenced by Paul Erdés himself. A large number of the topics 
were suggested by him, and many of the proofs trace directly back to him, 
or were initiated by his supreme insight in asking the right question or in 
making the right conjecture. So to a large extent this book reflects the views 
of Paul Erdos as to what should be considered a proof from The Book. 

A limiting factor for our selection of topics was that everything in this book 
is supposed to be accessible to readers whose backgrounds include only 
a modest amount of technique from undergraduate mathematics. A little 
linear algebra, some basic analysis and number theory, and a healthy dollop 
of elementary concepts and reasonings from discrete mathematics should 
be sufficient to understand and enjoy everything in this book. 

We are extremely grateful to the many people who helped and supported 
us with this project — among them the students of a seminar where we 
discussed a preliminary version, to Benno Artmann, Stephan Brandt, Stefan 
Felsner, Eli Goodman, Torsten Heldmann, and Hans Mielke. We thank 
Margrit Barrett, Christian Bressler, Ewgenij Gawrilow, Michael Joswig, 
Elke Pose, and Jérg Rambau for their technical help in composing this 
book. We are in great debt to Tom Trotter who read the manuscript from 
first to last page, to Karl H. Hofmann for his wonderful drawings, and 
most of all to the late great Paul Erd6s himself. 


Berlin, March 1998 Martin Aigner - Giinter M. Ziegler 
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Preface to the Sixth Edition 


The idea to this project was born during some leisurely discussions at the 
Mathematisches Forschungsinstitut in Oberwolfach with the incomparable 
Paul Erdés in the mid-1990s. It is now nearly twenty years ago that we 
presented the first edition of our book on occasion of the International 
Congress of Mathematicians in Berlin 1998. At that time we could not 
possibly imagine the wonderful and lasting response our book about The 
Book would have, with all the warm letters, interesting comments and sug- 
gestions, new editions, and as of now thirteen translations. It is no exagger- 
ation to say that it has become a part of our lives. 

In addition to numerous improvements and smaller changes, many of them 
suggested by our readers, for the present sixth edition we wrote an entirely 
new chapter with Gurvits’s proof of Van der Waerden’s permanent conjec- 
ture, used this to derive asymptotics for the number of Latin squares, added 
a new, fourth proof for the Euler theorem Pye + = 7? /6, and present 
a new geometric explanation for Heath-Brown’s involution proof for the 
Fermat two squares theorem. 

We thank everyone who helped and encouraged us over all these years. For 
the second edition this included Stephan Brandt, Christian Elsholtz, Jiirgen 
Elstrodt, Daniel Grieser, Roger Heath-Brown, Lee L. Keener, Christian 
Lebceuf, Hanfried Lenz, Nicolas Puech, John Scholes, Bernulf Wei8bach, 
and many others. The third edition benefitted especially from input by 
David Bevan, Anders Bjérner, Dietrich Braess, John Cosgrave, Hubert 
Kalf, Giinter Pickert, Alistair Sinclair, and Herb Wilf. For the fourth edi- 
tion, we were particularly indebted to Oliver Deiser, Anton Dochtermann, 
Michael Harbeck, Stefan Hougardy, Hendrik W. Lenstra, Giinter Rote, 
Moritz W. Schmitt, and Carsten Schultz for their contributions. For the fifth 
edition, we gratefully acknowledged ideas and suggestions by Ian Agol, 
France Dacar, Christopher Deninger, Michael D. Hirschhorn, Franz 
Lemmermeyer, Raimund Seidel, Tord Sjédin, and John M. Sullivan, as 
well as help from Marie-Sophie Litz, Miriam Schloter, and Jan Schneider. 
For the present sixth edition, very valuable hints were provided by France 
Dacar again, as well as by David Benko, Jan Peter Schafermeyer, and 
Yuliya Semikina. 

Moreover, we thank Ruth Allewelt at Springer in Heidelberg and Christoph 
Eyrich, Torsten Heldmann, and Elke Pose in Berlin for their continuing sup- 
port throughout these years. And finally, this book would certainly not look 
the same without the original design suggested by Karl-Friedrich Koch, and 
the superb new drawings provided again and again by Karl H. Hofmann. 


Berlin, March 2018 Martin Aigner - Giinter M. Ziegler 
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“Trrationality and ™” 


Six proofs 
of the infinity of primes 


It is only natural that we start these notes with probably the oldest Book 
Proof, usually attributed to Euclid (Elements IX, 20). It shows that the 
sequence of primes does not end. 


M@ Euclid’s Proof. For any finite set {p1,...,p,} of primes, consider 
the number n = pip2--:p, + 1. This n has a prime divisor p. But p is 
not one of the p;: otherwise p would be a divisor of n and of the product 
pip2:::p,, and thus also of the difference n — pyp2---p, = 1, which is 
impossible. So a finite set {p1,..., p,} cannot be the collection of all prime 
numbers. 


Before we continue let us fix some notation. N = {1,2,3,...} is the set 
of natural numbers, Z = {...,—2,—1,0,1,2,...} the set of integers, and 
P = {2,3,5,7,...} the set of primes. 

In the following, we will exhibit various other proofs (out of a much longer 
list) which we hope the reader will like as much as we do. Although they 
use different view-points, the following basic idea is common to all of them: 
The natural numbers grow beyond all bounds, and every natural number 
n > 2 has a prime divisor. These two facts taken together force P to be 
infinite. The next proof is due to Christian Goldbach (from a letter to Leon- 
hard Euler 1730), the third proof is apparently folklore, the fourth one is 
by Euler himself, the fifth proof was proposed by Harry Fiirstenberg, while 
the last proof is due to Paul Erdés. 


@ Second Proof. Let us first look at the Fermat numbers F,, = 2?" +1 for 
n = 0,1,2,.... We will show that any two Fermat numbers are relatively 
prime; hence there must be infinitely many primes. To this end, we verify 
the recursion n-1 


l[™ = ™-2 (2)), 
k=0 


from which our assertion follows immediately. Indeed, if m is a divisor of, 
say, F}, and F,, (k < n), then m divides 2, and hence m = 1 or 2. But 
m = 2 is impossible since all Fermat numbers are odd. 


To prove the recursion we use induction on n. For n = 1 we have Fo = 3 
and fF — 2 = 3. With induction we now conclude 


n n-1 
IL * =([[ *%)™ = (@-9h = 
k=0 k=0 


gntl 


=(2 -NQ 410" —1 = Bay. 
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Fo —_ 3 

Fo = 5 

fo = 17 

f3 = 257 

fy = 65537 

fs = 641-6700417 


The first few Fermat numbers 
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Check for 
updates 


Six proofs of the infinity of primes 


Lagrange’s theorem 

If G is a finite (multiplicative) group 
and U is a subgroup, then |U| 
divides |G]. 

@ Proof. Consider the binary rela- 
tion 


a~b: <> ba‘ EU. 


It follows from the group axioms 
that ~ is an equivalence relation. 
The equivalence class containing an 
element a is precisely the coset 


Ua={xa:x2€U}. 


Since clearly |Ua| = |U|, we find 
that G decomposes into equivalence 
classes, all of size |U|, and hence 
that |U| divides |G]. 


In the special case when U is a cyclic 
subgroup {a,a?,...,a™} we find 
that m (the smallest positive inte- 
ger such that a” = 1, called the 
order of a) divides the size |G| of 


the group. In particular, we have 
al@l = 1, 

=> 

1 2 n n+l 


Steps above the function f(t) = 


@ Third Proof. Suppose P is finite and p is the largest prime. We consider 
the so-called Mersenne number 2? — 1 and show that any prime factor q 
of 2? — 1 is bigger than p, which will yield the desired conclusion. Let g be 
a prime dividing 2? — 1, so we have 2? = 1 (mod gq). Since p is prime, this 
means that the element 2 has order p in the multiplicative group Z,\{0} of 
the field Z,. This group has q — 1 elements. By Lagrange’s theorem (see 
the box) we know that the order of every element divides the size of the 
group, that is, we have p|q — 1, and hence p < gq. 


Now let us look at a proof that uses elementary calculus. 


@ Fourth Proof. Let 7(x) := #{p < x: p € P} be the number of primes 
that are less than or equal to the real number x. We number the primes 
P = {pi,p2,p3,.-.} in increasing order. Consider the natural logarithm 


log x, defined as loga = f" +dt. 


Now we compare the area below the graph of f(t) = + with an upper step 
function. (See also the appendix on page 12 for this method.) Thus for 
n<a2<n-+1we have 


] < 1 : SF as =F : : 
og x 
, a 2 n-l in 
1 : 
< oe —, where the sum extends over all m € N which have 


only prime divisors p < x. 


Since every such m can be written in a unique way as a product of the form 


II pre, we see that the last sum is equal to 
pSau 


1 
Oe) 
pEeP k>0 
pyu 


The inner sum is a geometric series with ratio 7 hence 


7 (x) 
1 
log < II = Pe ee Pk : 
per | Pp pep P~ kz Dk 
psx psu 
Now clearly p;, > k + 1, and thus 
: 1 1 k+1 
Pky eee et 
pe-1 Pr—-1 k k 
and therefore 
T(x) 
k+1 
logx < _ = n(r)+1 
k=1 


Everybody knows that log x is not bounded, so we conclude that (x) is 
unbounded as well, and so there are infinitely many primes. 


Six proofs of the infinity of primes 


@ Fifth Proof. After analysis it’s topology now! Consider the following 
curious topology on the set Z of integers. For a,b € Z, b > 0, we set 


Na» = {a+nb: ne Z}. 


Each set N,,» is a two-way infinite arithmetic progression. Now call a set 
O C Z open if either O is empty, or if to every a € O there exists some 
b > O with Na» © O. Clearly, the union of open sets is open again. If 
O,, Oz are open, and a € O, NM Og with Nay, CG O; and Na», © Oo, 
then a € Na.b,b, C O1 M Oz. So we conclude that any finite intersection 
of open sets is again open. So, this family of open sets induces a bona fide 
topology on Z. 


Let us note two facts: 
(A) Any nonempty open set is infinite. 
(B) Any set N,.y is closed as well. 


Indeed, the first fact follows from the definition. For the second we observe 


b-1 
Na,b = Z \ | Noses 
i=1 


which proves that N,,, is the complement of an open set and hence closed. 


So far the primes have not yet entered the picture — but here they come. 
Since any number n ¥ 1, —1 has a prime divisor p, and hence is contained 
in No,p, we conclude 


Z\{1,-1) = LJ Now. 


peP 


Now if P were finite, then Uper No,p would be a finite union of closed sets 
(by (B)), and hence closed. Consequently, {1,1} would be an open set, 
in violation of (A). 


@ Sixth Proof. Our final proof goes a considerable step further and 
demonstrates not only that there are infinitely many primes, but also that 
the series ect 5 diverges. The first proof of this important result was 
given by Euler (and is interesting in its own right), but our proof, devised 
by Erdés, is of compelling beauty. 

Let p1,p2,p3,-.. be the sequence of primes in increasing order, and 
assume that per ; converges. Then there must be a natural number k 


such that yee a < 5: Let us call p,,...,p,% the small primes, and 
Pk+1;Pk+2,;--- the big primes. For an arbitrary natural number N we there- 


fore find 
N N 
eae a cy 
i>k+1 Pi 


“Pitching flat rocks, infinitely” 


Six proofs of the infinity of primes 


Issai Schur 


Let Nz be the number of positive integers n < N which are divisible by at 
least one big prime, and NV, the number of positive integers n < N which 
have only small prime divisors. We are going to show that for a suitable NV 


N+N, < N, 


which will be our desired contradiction, since by definition Nz; + N, would 
have to be equal to NV. 

To estimate Nz note that |X| counts the positive integers n < N which 
are multiples of p;. Hence by (1) we obtain 


N N 
Nes DS | a: (2) 


Let us now look at NV. We write every n < N which has only small prime 
divisors in the form n = a,,b2, where a, is the square-free part. Every ay, 
is thus a product of different small primes, and we conclude that there are 
precisely 2” different square-free parts. Furthermore, as b,, < ./n < VN, 


we find that there are at most VN different square parts, and so 
N, < 2*VN. 


Since (2) holds for any N, it remains to find a number N with Ok. N < x 
or 2*+1 < WN, and for this N = 2?*+? will do. 


Appendix: Infinitely many more proofs 


Our collection of proofs for the infinitude of primes contains several other 
old and new treasures, but there is one of very recent vintage that is quite 
different and deserves special mention. Let us try to identify sequences S 
of integers such that the set of primes Pg that divide some member of S 
is infinite. Every such sequence would then provide its own proof for the 
infinity of primes. The Fermat numbers F,, studied in the second proof 
form such a sequence, while the powers of 2 don’t. Many more examples 
are provided by a theorem of Issai Schur, who showed in 1912 that for 
every nonconstant polynomial p(a) with integer coefficients the set of all 
nonzero values {p(n) 4 0: n € N} is such a sequence. For the polynomial 
p(a) = x, Schur’s result gives us Euclid’s theorem. As another example, 
for p(x) = x? + 1 we get that the “squares plus one” contain infinitely 
many different prime factors. 

The following result due to Christian Elsholtz is a real gem: It generalizes 
Schur’s theorem, the proof is just clever counting, and it is in a certain sense 
best possible. 


Six proofs of the infinity of primes 7 


Let S = (81, $2, 83,...) be a sequence of integers. We say that 
e Sis almost injective if every value occurs at most c times for some con- 
stant c, 
e Sis of subexponential growth if |s,| < 2 for allm, where f:N +R, In place of 2 we could take any 
is a function with i a > 0. other base pial than 1; for example, 
|sn| < e° 
of sequences. 


gf (mr) 


. ee : leads to the same class 
Theorem. [f the sequence S = (81, 82, 83,...) is almost injective and of 


subexponential growth, then the set Ps of primes that divide some member 


of S is infinite. 


@ Proof. We may assume that f(n) is monotonely increasing. Otherwise, 
replace f(n) by F(n) = max;<, f (i); you can easily check that with this 
F'(n) the sequence S again satisfies the subexponential growth condition. 


Let us suppose for a contradiction that Ps = {p,,...,p,} is finite. For 
n €N, let 


Sn =Enpy'+--p,*,  withe, € {1,0,—-1}, a, > 0, 


where the a; = a;(n) depend on n. (For s,, = 0 we can put a; = 0 for 
all 2.) Then 


garteton < |g) < 2! for s, £0, 
and thus by taking the binary logarithm 
O<a;<ait-:-t+ax<2™ forl<i<k. 


Hence there are not more than 2/(”) + 1 different possible values for each 
a; = a;(n). Since f is monotone, this gives a first estimate 


#({distinct |sp| A Oforn < N} < (QF) 41)F < QUOtDE, 


On the other hand, since S is almost injective only c terms in the sequence 
can be equal to 0, and each nonzero absolute value can occur at most 2c 
times, so we get the lower estimate 


N-c 


#{ distinct |s,,| A 0 forn < N} > 5 
c 


Altogether, this gives 


N= — gecs(nyti). 


2c 7 
Taking again the logarithm with base 2 on both sides, we obtain 
log,(N — c) — log,(2c) < k(f(N)+1) forall N. 


This, however, is plainly false for large N, as k and c are constants, so 
logs(N~c) F(N) 
logs N logs N 


goes to 1 for N — oo, while goes to 0. 


Six proofs of the infinity of primes 


Can one relax the conditions? At least neither of them is superfluous. 


That we need the “almost injective” condition can be seen from sequences 
S like (2,2,2,...) or (1,2,2,4,4,4,4,8,...), which satisfy the growth 
condition, while Ps = {2} is finite. 

As for the subexponential growth condition, let us remark that it cannot 
be weakened to a requirement of the form f(r) < ¢ fora fixed e > 0. 


logan 
To see this, one analyzes the sequence of all numbers of the form p}' - - - pz" 
arranged in increasing order, where p,,..., px are fixed primes and k is 


of (nr) with f(m) ~~ i while Ps 


logan 


large. This sequence S' grows roughly like 2 
is finite by construction. 
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Bertrand’s postulate 


We have seen that the sequence of prime numbers 2, 3,5,7,... is infinite. 
To see that the size of its gaps is not bounded, let V := 2-3-5---p denote 
the product of all prime numbers that are smaller than k + 2, and note that 
none of the /& numbers 


N+2,N+3,N+4,....N+k,N+(k+1) 


is prime, since for 2 < i < k +1 we know that 7 has a prime factor that is 
smaller than k + 2, and this factor also divides N, and hence also N + 7. 
With this recipe, we find, for example, for k = 10 that none of the ten 
numbers 

2312, 2313, 2314,..., 2321 


is prime. 

But there are also upper bounds for the gaps in the sequence of prime num- 
bers. A famous bound states that “the gap to the next prime cannot be larger 
than the number we start our search at.” This is known as Bertrand’s pos- 
tulate, since it was conjectured and verified empirically for n < 3000000 
by Joseph Bertrand. It was first proved for all n by Pafnuty Chebyshev in 
1850. A much simpler proof was given by the Indian genius Ramanujan. 
Our Book Proof is by Paul Erdés: it is taken from Erdés’ first published 
paper, which appeared in 1932, when Erdés was 19. 


Bertrand’s postulate 
For every n = 1, there is some prime number p with n <p < 2n. 


@ Proof. We will estimate the size of the binomial coefficient C2) care- 


fully enough to see that if it didn’t have any prime factors in the range 
n <p < 2n, then it would be “too small.” Our argument is in five steps. 


(1) We first prove Bertrand’s postulate for n < 511. For this one does not 
need to check 511 cases: it suffices (this is “Landau’s trick’’) to check that 


2,3, 5, 7, 13, 23, 43, 83, 163, 317, 521 
is a sequence of prime numbers, where each is smaller than twice the pre- 


vious one. Hence every interval {y:n < y < 2n}, with n < 511, contains 
one of these 11 primes. 
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Chapter 2 


Check for 
tS al 


Joseph Bertrand 


Beweis eines Satzes von Tschebyschef 


Von P. Eais in Budapest 


* bewiesenen Satz, laut 
La und ihrer 2weifachen 
I gibt, liegen in der Literatur mehrere 
Beweise vor. Als cinfachsten kann man ohne Zweifel den Rewcis 
von Ramanujan!) bezeichnen. In seinem Werk Vorlesungen iiber 
le (Leipzig, 1927), Band 1, S. 66—68 gibt Herr Lanpau 
achen Beweis far eine: uber die Anzahl 
nen Grenze, aus welchem un- 
Sg zwischen ciner natOrlichen 
rinvahl liegt. For die augen- 
kommt es nicht auf die 
enden Konstanten 
numerische Verfolgung 


rbGer als 2 ausfillt 


Zahl um 
bi 
humerische B 
an; man Uberzeu 
des Beweises leic 
In den ke 


cine Verschirt zugrunde liegen- 
den Ideen zu nf TSCHEBYSCHEF 
schen Satzes ie mir scheint an Ein 
lachkeit nicht hi eweis steht. Ciriechische 


Buehstaben sollen im sitive, lateinische 


Buchstaben nattrliche 


rbehalten 
1. Der Binomialkocffizient 


Jowrnat of the Ins 
Coltected Papers of 
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Bertrand’s postulate 


Legendre’s theorem 


The number n! contains the prime 
factor p exactly 


times. 


B Proof. Exactly [2 of the factors 
of n! = 1-2-3---nare divisible by 
p, which accounts for 2 p-factors. 
Next, eal of the factors of n! are 
even divisible by p*, which accounts 


for the next eal prime factors p 


of n!, etc. 


(2) Next we prove that 


[| = 4e-l 


pSau 


for all real x > 2, (1) 


where our notation — here and in the following — is meant to imply that 
the product is taken over all prime numbers p < zx. The proof that we 
present for this fact uses induction on the number of these primes. It is 
not from Erd6s’ original paper, but it is also due to Erdés (see the margin), 
and it is a true Book Proof. First we note that if g is the largest prime with 


q < «, then 
[[» = [[» and 


pSu pS 


4q-1 < 4gt-l. 


Thus it suffices to check (1) for the case where x = q is a prime number. For 
q = 2 we get “2 < 4,’ so we proceed to consider odd primes g = 2m + 1. 
(Here we may assume, by induction, that (1) is valid for all integers x in 
the set {2,3,...,2m}.) For g = 2m + 1 we split the product and compute 


Ile= I] 2: ese eae 


pr2m+l pxm4+l m+l<p<2m+1 


All the pieces of this “one-line computation” are easy to see. In fact, 


if. 


psmt+l 
2m+1 
m 


2m+1)! ; ; 
oe is an integer, where 


the primes that we consider all are factors of the numerator (2m + 1)!, but 
not of the denominator m!(m + 1)!. Finally 


?"*) < 92m 
m 


(" + ‘) (? + 
and 
m m+1 


are two (equal!) summands that appear in 


2m+1 (" ae ‘ Sect 
» k =2 : 


k=0 


holds by induction. The inequality 
I] » 
m+l<p<2m+1 


a) = 


follows from the observation that ( 


holds since 


(2n)! 
nin! 


(3) From Legendre’s theorem (see the box) we get that 7”) = 
tains the prime factor p exactly 


x (L] Lal) 


con- 
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times. Here each summand is at most 1, since it satisfies 


2n n 2n n 
2 2 2 1) =2, 
E | Fa pk (§ ) 


and it is an integer. Furthermore the summands vanish whenever p* > 2n. 


Thus () contains p exactly 


y (3 = Fay < max{r:p" < 2n} 


k>1 


times. Hence the largest power of p that divides (”) is not larger than 2n. 
In particular, primes p > 2n appear at most once in Ce. 


Furthermore — and this, according to Erdés, is the key fact for his proof 
— primes p that satisfy 2n < p < ndo not divide oa) at all! Indeed, 
3p > 2n implies (for n > 3, and hence p > 3) that p and 2p are the only 
multiples of p that appear as factors in the numerator of Gn)! ; 
two p-factors in the denominator. 


while we get 


(4) Now we are ready to estimate CC benefitting from a suggestion by 
Raimund Seidel, which nicely improves Erdés’ original argument. For 
n > 3, using an estimate from page 14 for the lower bound, we get 


qn In 
pSvV2an V2n<p<in n<ps2n 


Now, there are no more than \/2n primes in the first factor; hence using (1) 
for the second factor and letting P(n) denote the number of primes between 
n and 2n we get 


that is, 
43 < (aye em), (2) 


(5) Taking the logarithm to base 2, the last inequality is turned into 


2n 
P 
OO) > Bice (On) 


(V2n +1). (3) 


It remains to verify that the right-hand side of (3) is positive for n large 
enough. We show that this is the case form = 2° = 512 (actually, it holds 
from n = 468 onward). By writing 2n — 1 = (V2n — 1)(V2n +1) and 
cancelling the (\/2n + 1)-factor it suffices to show 


V2n—1 > 3log,(2n) — forn > 2°. (4) 
For n = 2°, (4) becomes 31 > 30, and comparing the derivatives 


(/z—1) = IVE and (3log, 7)’ = eee we see that \/z — 1 grows 
faster than 3 log, x for x > (;-2,)? © 75 and thus certainly for x > 2'° = 
1024, : 


Examples such as 

Geo bre 1098 

Ge) S 2? 38" +8? 17 < 19-938 

(Gey = 2048" «B+ 17+ 19.285 29 
illustrate that “very small” prime factors 
p < V2n can appear as higher powers 
in loag “small” primes with /2n < 


n 


ps 3n appear at most once, while 


factors in the gap with zn <p<n 
don’t appear at all. 
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Bertrand’s postulate 


One can extract even more from this type of estimates: Comparing the 
derivatives of both sides, one can sharpen (4) to 


21 
V2n-1 > ri log,(2n) for n> 2"?, 
which with a little arithmetic and (3) implies 


2 n 
P > =—_.. 
(n) 2 7 logs (2n) 
This is not that bad an estimate: the “true” number of primes in this range 
is roughly n/ log n. This follows from the “prime number theorem,” which 
says that the limit 
_. #{p <n: pis prime} 
lim 
n— oo n/ log nm 


exists, and equals 1. This famous result was first proved by Hadamard and 
de la Vallée-Poussin in 1896; Selberg and Erdés found an elementary proof 
(without complex analysis tools, but still long and involved) in 1948. 


On the prime number theorem itself the final word, it seems, is still not in: 
for example a proof of the Riemann hypothesis (see page 64), one of the 
major unsolved open problems in mathematics, would also give a substan- 
tial improvement for the estimates of the prime number theorem. But also 
for Bertrand’s postulate, one could expect dramatic improvements. In fact, 
the following is a famous unsolved problem: 


Is there always a prime between n? and (n + 1)?? 


For additional information see [3, p. 19] and [4, pp. 248, 257]. 


Appendix: Some estimates 


Estimating via integrals 


There is a very simple-but-effective method of estimating sums by integrals 
(as already encountered on page 4). For estimating the harmonic numbers 


Se 
7 k 
k=1 
we draw the figure in the margin and derive from it 
1 me 
A,-1 = a < i 5 at = logn 


by comparing the area below the graph of f(t) = + (1 < t < n) with the 
area of the dark shaded rectangles, and 


— 1 
a5 = > fi; 7 dt = logn 
k=1 
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by comparing with the area of the large rectangles (including the lightly 
shaded parts). Taken together, this yields 


1 
logn+— < Hy, < logn + 1. 
n 


In particular, lim H,, — oo, and the order of growth of H,, is given by 
n—->co 


lim oe = 1. But much better estimates are known (see [2]), such as 


Here O (+) denotes a function f(n) 
n 


1 1 1 1 1 
H, = logn 4 ( ) such that f(n) < c=p holds for some 
2n  12n?2 ~— 120n4 n& }? 


constant c. 


where y * 0.5772 is “Euler’s constant.” 


Estimating factorials — Stirling’s formula 


The same method applied to 
log(n!) = log2+log34+---+logn = SS log k 
k=2 
yields 
log((n—1)!) < | logtdt < log(n!), 
1 
where the integral is easily computed: 
/ logtdt = [tloge = 7 = nlogn—n-+l1. 
1 1 
Thus we get a lower estimate on n! 
nl > e@ logn—n+1 _ (=) n 
and at the same time an upper estimate 
nl = n(n—1)! < neten—mt] — en(“) ; 


€ 


Here a more careful analysis is needed to get the asymptotics of n!, as given 


by Stirling’s formula Here f(n) ~ g(n) means that 


' 5} (“)" 
nl~w mm{—) . 
€ lim fe = 
And again there are more precise versions available, such as ae 
n\n 1 1 139 1 
| = v2nn(“) (14 =) 1, 
” me ( “ion? oben? Sida (=)) 


Estimating binomial coefficients 


Just from the definition of the binomial coefficients ea) as the number of 


n n n 
k-subsets of an n-set, we know that the sequence (ale (7), er (”) of 
binomial coefficients 
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1 6 
1 5 10 10 
1 6 15 20 
1 7 21 35 35 


Pascal’s triangle 


15 
21 


6 


1 


1 


e sums to > =a 
k=0 


e is symmetric: (7) = (,,",): 


From the functional equation G) = noktt ( ee 1) one easily finds that for 


every n the binomial coefficients es form a sequence that is symmetric 
and unimodal: it increases towards the middle, so that the middle binomial 
coefficients are the largest ones in the sequence: 


Pa) Sera) ay) er Ga) 


Here |x| resp. [a] denotes the number x rounded down resp. rounded up 
to the nearest integer. 


From the asymptotic formulas for the factorials mentioned above one can 
obtain very precise estimates for the sizes of binomial coefficients. How- 
ever, we will only need very weak and simple estimates in this book, such 
as the following: (7) < 2” for all k, while for n > 2 we have 


(inal) a 


with equality only for n = 2. In particular, forn > 1, 


2n 4" 
>. 
nj ~ 2n 
This holds since Gs > a middle binomial coefficient, is the largest entry 
in the sequence (5) +("), (7), (3),---> (,"1). whose sum is 2”, and whose 
average is thus z 


On the other hand, we note the upper bound for binomial coefficients 


(") n(n—1)(n=k+1) — nk — nk 


Ry k! ~ kl < ghat 


which is a reasonably good estimate for the “small” binomial coefficients 
at the tails of the sequence, when n is large (compared to k). 


References 


[1] P. ERDOs: Beweis eines Satzes von Tschebyschef, Acta Sci. Math. (Szeged) 5 
(1930-32), 194-198. 


[2] R. L. GRAHAM, D. E. KNUTH & O. PATASHNIK: Concrete Mathematics. 
A Foundation for Computer Science, Addison-Wesley, Reading MA 1989. 


[3] G. H. HARDY & E. M. WRIGHT: An Introduction to the Theory of Numbers, 
Fifth edition, Oxford University Press 1979. 


[4] P. RIBENBOIM: The New Book of Prime Number Records, Springer-Verlag, 
New York 1989. 


Binomial coefficients 
are (almost) never powers 


There is an epilogue to Bertrand’s postulate which leads to a beautiful re- 
sult on binomial coefficients. In 1892 Sylvester strengthened Bertrand’s 
postulate in the following way: 


Ifn > 2k, then at least one of the numbers n,n—1,...,n—k+1 
has a prime divisor p greater than k. 


Note that for n = 2k we obtain precisely Bertrand’s postulate. In 1934, 
Erdés gave a short and elementary Book Proof of Sylvester’s result, running 
along the lines of his proof of Bertrand’s postulate. There is an equivalent 
way of stating Sylvester’s theorem: 


The binomial coefficient 


() ie er) ee ee 


k} 
always has a prime factor p > k. 
With this observation in mind, we turn to another one of Erdés’ jewels: 
When is equal to a power m*? 
The case k = ¢ = 2 leads to a classical topic. Multiplying (o) =m? 
by 8 and rearranging terms gives (2n — 1)? — 2(2m)? = 1, which is a 
special case of Pell’s equation, x” — 2y? = 1. One learns in number theory 
that this equation has infinitely many positive solutions (2%, yx), which are 
given by x, + yxV2 = (3 + 2/2)" for k > 1. The smallest examples are 
(x1, Yi) = {3; 2), (xo, yo) = (17, 12), and (x3, Yy3) = (99, 70), yielding 
_ 72 — 2 _— 9r2 
(5) =I eG) = 6 ,and (°,’) = 35°. 
For k = 2 and @ > 2 there are no further solutions, and for k = 3 it is 
known that (3) = m* has the unique solution n = 50, m = 140, @ = 2, see 


Gyory [3]. But now we are at the end of the line. For k > 4 and any @ > 2 
no solutions exist, and this is what Erdés proved by an ingenious argument. 


Theorem. The equation () = m* has no integer solutions with 
L>2and4<k<n—-4. 
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Binomial coefficients are (almost) never powers 


M Proof. Note first that we may assume n > 2k because of (‘') = (,",)- 
Suppose the theorem is false, and that ea) = m*. The proof, by contra- 
diction, proceeds in the following four steps. 


n 


(1) By Sylvester’s theorem, there is a prime factor p of ( :) greater than k, 
hence p* divides n(n — 1)---(n — k +1). Clearly, only one of the factors 
n — ican be a multiple of any such p > k, and we conclude p* | n — i, and 
therefore 

n> pt > k& > k?. 


(2) Consider any factor n — j of the numerator and write it in the form 
N— j= 4; mi, where a, is not divisible by any nontrivial ¢-th power. We 
note by (1) that a; has only prime divisors less than or equal to k. We want 
to show next that a; # a; fori # 7. Assume to the contrary that a; = a; 
for some i < j. Then m; > m,; +1 and 


k 


V 


(n— 8) —(n—j) = aj(m$ — mf) > aj((m; +1)’ — mf) 
> ajlms— > e(ajmf)*/? > &(n—k+1)? 
> He 1)Y? Soni? 


which contradicts n > k? from above. 


(3) Next we prove that the a,;’s are the integers 1,2,...,& in some order. 
(According to Erdés, this is the crux of the proof.) Since we already know 
that they are all distinct, it suffices to prove that 


aoa1-+-Qp—1 divides kl. 


n 
k 


£ 


Substituting n — 7 = ajms into the equation ( ) = m*, we obtain 


aoQ1++*Ap—1(momy- + mr_1)* = klm*. 
Cancelling the common factors of mo ---™my,—1 and m yields 
a001°--ap_1ue = klv® 
with ged(u,v) = 1. It remains to show that v = 1. If not, then v con- 
tains a prime divisor p. Since gced(u,v) = 1, p must be a prime divisor 
of aga, -+-++az—1 and hence is less than or equal to k. By the theorem of 
Legendre (see page 8) we know that k! contains p to the power >> ,., [4] : 
We now estimate the exponent of p in n(n — 1)---(n—k +1). Leti bea 


positive integer, and let b} < bz < --- < by be the multiples of p’ among 
n,n—1,...,n—k+1. Then b, = b; + (s — 1)p* and hence 


(s—l)p' = bb—-b < n—(n—k+4+1) = k-1, 


which implies 


s < - |+1 < age 
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So for each 7 the number of multiples of p' among n,...,n—k+1, and 
hence among the a,’s, is bounded by [4] + 1. This implies that the expo- 
nent of p in aga, --- az_1 is at most 


£-1 


> (gl +4) 


i=1 


with the reasoning that we used for Legendre’s theorem in Chapter 2. The 
only difference is that this time the sum stops at i = ¢ — 1, since the a,’s 
contain no é-th powers. 

Taking both counts together, we find that the exponent of p in v‘ is at most 


£-1 


~(G]+-D15] < e-1 


i=1 i> 


£ 


and we have our desired contradiction, since v* is an £-th power. 


This suffices already to settle the case £ = 2. Indeed, since k > 4 one of 
the a;’s must be equal to 4, but the a;’s contain no squares. So let us now 
assume that ¢ > 3. 


(4) Since k > 4, we must have a;, = 1, ai, = 2, aj, = 4 for some 21, 22, 23, 
that is, 


n—iy=mi{, n—ig = 2m, n— iz = 4mf. 


We claim that (n — iz)? 4 (n — i1)(n — ig). If not, put b = n — ig and 
n—i,; =b-—2,n—1i3 =b+ y, where 0 < |z|,|y| < k. Hence 


b= (b-a)(b+y) or (y—a)b=ay, 
where x = y is plainly impossible. Now we have by part (1) 
jzy| = dly-a| > b > n—-k > (k-1) > |ayl, 


which is absurd. 


So we have ms #~ m m3, where we assume ma > m mz (the other case 
being analogous), and proceed to our last chains of inequalities. We obtain 


2k-ln > n?-(n—k+1)? 


> (n ig)? (n i1)(n iz) 
= A{m3*—(mms3)"] > 4[(mims + 1) — (mims)‘] 


> Ami *ms-'. 
Since £ > 3 andn > k* > k® > 6k, this yields 
2(k—1)nmym3 > 4lmimS = &(n—i1)(n — 43) 


> &n—k+1) > a(n— =)? > 2n?. 


We see that our analysis so far agrees 
with (°?) = 140°, as 


50 = 2-5? 
49 = 1-7? 
48 = 3-4? 


and5-7-4= 140. 
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Now since m; < n'/" < n}/3 we finally obtain 


kn2/3 > kmym3 > (k—1)mim3 > n, 


or k? > n. With this contradiction, the proof is complete. 
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Representing numbers Chapter 4 
as sums of two squares 2 


Which numbers can be written as sums of two squares? 


This question is as old as number theory, and its solution is a classic in the 
field. The “hard” part of the solution is to see that every prime number of 
the form 4m + 1 is a sum of two squares. G. H. Hardy writes that this 
two square theorem of Fermat “is ranked, very justly, as one of the finest in 
arithmetic.” Nevertheless, one of our Book Proofs below is quite recent. 


CoONATAR WHE 
II 
~) 
is} 
| 
T 
m 
iw) 


me 
j=) 


Let’s start with some “warm-ups.” First, we need to distinguish between 
the prime p = 2, the primes of the form p = 4m + 1, and the primes of 
the form p = 4m + 3. Every prime number belongs to exactly one of these 
three classes. At this point we may note (using a method “a la Euclid”’) that 
there are infinitely many primes of the form 4m + 3. In fact, if there were 
only finitely many, then we could take p; to be the largest prime of this 
form. Setting 


a 
— 


Np = 27-3-5-++pp—1 


(where p1 = 2, pg = 3, p3 = 5, ... denotes the sequence of all primes), we 
find that Nj, is congruent to 3 (mod 4), so it must have a prime factor of the 
form 4m + 3, and this prime factor is larger than p; — contradiction. 


Our first lemma characterizes the primes for which —1 is a square in the Pierre de Fermat 
field Z,, (which is reviewed in the box on the next page). It will also give 
us a quick way to derive that there are infinitely many primes of the form 


4m + 1. 
Lemma 1. For primes p = 4m + 1 the equation s? = —1 (mod p) has two 
solutions s € {1,2,...,p—1}, for p = 2 there is one such solution, while 


for primes of the form p = 4m + 3 there is no solution. 


H Proof. For p = 2 take s = 1. For odd p, we construct the equivalence 
relation on {1,2,...,p — 1} that is generated by identifying every element 
with its additive inverse and with its multiplicative inverse in Z,. Thus the 
“general” equivalence classes will contain four elements 


{x,—x,@,—=} 
since such a 4-element set contains both inverses for all its elements. How- 


ever, there are smaller equivalence classes if some of the four numbers are 
not distinct: 
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For p = 11 the partition is 

{1, 10}, {2,9, 6, 5}, {3, 8, 4, 7}; 
for p = 18 itis 

{1, 12}, {2, 11, 7, 6}, {3, 10, 9, 4}, 
{5,8}: the pair {5, 8} yields the two 


solutions of s? = —1 mod 13. 
+/0 1 2 3 4 
0/0 1 2 3 4 
I | 2 22-3. 4.0 
2/2 3 4 0 1 
3/3 4 0 1 2 
4;4 0 1 2 8 
-|/O0 1 2 3 4 
0;0 0 0 0 0 
1/0 1 2 3 4 
2/0 2 4 1 8 
3/0 3 1 4 2 
4/0 4 3 2 1 


Addition and multiplication in Zs 


e x = —2x is impossible for odd p. 

e x = @ is equivalent to x? = 1. This has two solutions, namely x = 1 
and x = p — 1, leading to the equivalence class {1, p — 1} of size 2. 

e x = —Z is equivalent to x? = —1. This equation may have no solution 
or two distinct solutions x9, p — Xo: in this case the equivalence class 
is {Xo,P _ to}. 

The set {1,2,...,p—1} has p—1 elements, and we have partitioned it into 


quadruples (equivalence classes of size 4), plus one or two pairs (equiva- 
lence classes of size 2). For p— 1 = 4m + 2 we find that there is only the 


one pair {1, p—1}, the rest is quadruples, and thus s? = —1 (mod p) has no 
solution. For p — 1 = 4m there has to be the second pair, and this contains 
the two solutions of s? = —1 that we were looking for. 


Lemma | says that every odd prime dividing a number M? + 1 must be of 
the form 4m + 1. This implies that there are infinitely many primes of this 
form: Otherwise, look at (2-3-5-++q,)? +1, where q, is the largest such 
prime. The same reasoning as above yields a contradiction. 


Prime fields 
If p is a prime, then the set Z, = {0,1,...,p — 1} with addition and 
multiplication defined “modulo p” forms a finite field. We will need 
the following simple properties: 


e For x € Zp, x # 0, the additive inverse (for which we usually 
write —2) is given by p— «x € {1,2,...,p—1}. If p > 2, then x 
and — are different elements of Z,. 


e Each x € Z,\{0} has a unique multiplicative inverse Z € Z,\{0}, 
with xt = 1(modp). 
The definition of primes implies that the map Z, — Zp, z > xz 
is injective for z # 0. Thus on the finite set Z,\{0} it must be 
surjective as well, and hence for each x there is a unique  # 0 
with x@ = 1(modp). 

e The squares 07,17, 2?,...,h? define different elements of Zp, for 
h=|é |. 
This is since x? = y”, or (x + y)(x — y) = 0, implies that x = y 
or that x = —y. The 1 + [4] elements 0?,1?,...,h? are called 
the squares in Zp. 


At this point, let us note “on the fly” that for all primes there are solutions 
for c? + y? = —1(modp). In fact, there are |$| + 1 distinct squares 
x? in Zp, and there are || + 1 distinct numbers of the form —(1 + y?). 
These two sets of numbers are too large to be disjoint, since Z, has only p 
elements, and thus there must exist x and y with z? = —(1 + y?) (modp). 
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Lemma 2. No number n = 4m + 3 is a sum of two squares. 


M@ Proof. The square of any even number is (2k)? = 4k? = 0 (mod 4), 
while squares of odd numbers yield (2k-+1)? = 4(k?+k)+1 = 1 (mod 4). 
Thus any sum of two squares is congruent to 0, 1 or 2 (mod 4). 


This is enough evidence for us that the primes p = 4m-+ 3 are “bad.” Thus, 
we proceed with “good” properties for primes of the form p = 4m +1. On 
the way to the main theorem, the following is the key step. 


Proposition. Every prime of the form p = 4m +1 is a sum of two squares, 
that is, it can be written as p = x* +-y? for some natural numbers x,y € N. 


We shall present here two proofs of this result — both of them elegant and 
surprising. The first proof features a striking application of the “pigeon- 
hole principle” (which we have already used “on the fly” before Lemma 2; 
see Chapter 28 for more), as well as a clever move to arguments “modulo p” 
and back. The idea is due to the Norwegian number theorist Axel Thue. 


@ Proof. Consider the pairs («’, y’) of integers with 0 < 2’, y’ < \/p, that 
is, x’, y’ € {0,1,..., |,/p]}. There are (|,/p| + 1)? such pairs. Using the 
estimate |x| + 1 > a for 2 = ,/p, we see that we have more than p such 
pairs of integers. Thus for any s € Z, it is impossible that all the values For p = 13, |,/p| = 3 we consider 
x’ — sy’ produced by the pairs (x’, y’) are distinct modulo p. That is, for 2’, y’ € {0,1,2,3}. Fors = 5, the sum 
every s there are two distinct pairs x’ —sy’ (mod 13) assumes the following 


yo yoo 2 values: 
(x',y"), (2",y") € {0,1,..., vp] } 


f 

¥ 
ee ae 
with 2’ — sy’ = «2 — sy” (modp). Now we take differences: We have 0/0 8 3 11 
xv’ — av" = s(y’ — y”) (modp). Thus if we define x = |x’ — a" |, y i= 1/1 9 4 «12 
|y’ — y”’|, then we get 2/12 10 5 0 
3 3 11 6 I 


(x,y) € {0,1,...,|/p]}? with 2 =-+sy(modp). 


Also we know that not both x and y can be zero, because the pairs (x’, y’) 
and (a, y”’) are distinct. 


Now let s be a solution of s? = —1(modp), which exists by Lemma 1. 


Then x? = s*y? = —y? (mod p), and so we have produced 


(z,y)€Z? with O<a?+y?<2p and 27+ y? =0(modp). 


But p is the only number between 0 and 2p that is divisible by p. Thus 
x” + y* = p: done! 


Our second proof for the proposition — also clearly a Book Proof — 
was discovered by Roger Heath-Brown in 1971 and appeared in 1984. 
(A condensed “one-sentence version” was given by Don Zagier.) It is so 
elementary that we don’t even need to use Lemma 1. 

Heath-Brown’s argument features three linear involutions: a quite obvious 
one, a hidden one, and a trivial one that gives “the final blow.” The second, 
unexpected, involution corresponds to some hidden structure on the set of 
integral solutions of the equation 4xy + z? = p. 
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@ Proof. We study the set 
S = {(a,y,2)¢2 :4czy+27 =p, x>0, y>O}. 


This set is finite. Indeed, x > 1 and y > 1 implies y < 4 anda < 4. So 
there are only finitely many possible values for x and y, and given zx and y, 
there are at most two values for z. 


1. The first linear involution is given by 
re 5s 5, (2, y, Zz) (y, 2, —#), 


that is, “interchange x and y, and negate z.” This clearly maps S to itself, 
and it is an involution: Applied twice, it yields the identity. Also, f has 
no fixed points, since z = 0 would imply p = 4zy, which is impossible. 
Furthermore, f maps the solutions in 


T := {(a,y,z)€S:2z> 0} 


to the solutions in S\T, which satisfy z < 0. Also, f reverses the signs of 
x — y and of z, so it maps the solutions in 


U = {(#,y,2z)€S:(a@-—y)+2z> 0} 


to the solutions in S\U. For this we have to see that there is no solution 
with (2—y)+z = 0, but there is none since this would give p = 4ry+27 = 
day +(x —y)? = (x+y). 

What do we get from the study of f? The main observation is that since 
f maps the sets T’ and U to their complements, it also interchanges the 
elements in T\U with these in U\T. That is, there is the same number of 
solutions in U that are not in T as there are solutions in T that are not in U 
— so T and U have the same cardinality. 


2. The second involution that we study is an involution on the set U: 
g:U —+U, (2,952) — > (@ —y + 2,9, 2y — 2). 


First we check that indeed this is a well-defined map: If (x, y, z) € U, then 
x—y+z>0,y > Oand 4c — y+ z)y t+ (2y — z)? = 4zy + z?, so 
g(x,y, z) € S. By (w—yt+z)—y+(2y—z) =x > 0 we find that indeed 
g(a,y,z) € U. 

Also g is an involution: g(x, y, z) = («@ —y +2, y, 2y — z) is mapped by g 
to ((@ —y +z) —y + (2y — 2), ys 2y — (2y — 2) = (@,y, 2). 

And finally g has exactly one fixed point: 


(x,y, Z) = g(x,y, Z) = (x—yt+z,y, 2y — 2) 
implies that y = z, but then p = 4ay + y? = (4a + y)y, which holds only 
fory = z=landz= aot 


But if g is an involution on U that has exactly one fixed point, then the 
cardinality of U is odd. 
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3. The third, trivial, involution that we study is the involution on T that 
interchanges z and y: 


h:T —+T, (x,y,z) +> (y, #, 2). 


This map is clearly well-defined, and an involution. We combine now our 
knowledge derived from the other two involutions: The cardinality of T is 
equal to the cardinality of U, which is odd. But if h is an involution on 
a finite set of odd cardinality, then it has a fixed point: There is a point 
(x,y,z) € T with x = y, that is, a solution of 


p= 42? +27 = (2x)? +27. 


Roger Heath-Brown came up with this proof in 1971, after studying an ac- 
count of Liouville’s papers on identities for parity functions. The second 
involution seems magical, and it was presented without an explanation how 
one could come up with it. There is, however, a geometric interpretation 
that beautifully visualizes and “explains” the involution and yields some- 
thing like a “proof without words”: We will summarize it (for p = 37) ina 
full-page picture on the next page. This version of the proof was apparently 
found by Alexander Spivak, a Moscow mathematics teacher, who presented 
it in a 2007 lecture for the “Mathematics Circle” for highschool students at 
Moscow State University. 


@ Proof. Again we fix a prime number p = 4n + 1 and consider the set of 
solutions 
T ={(z,y,z) € N? : dry + 2? = p}. 


Each element of this set gives rise to a winged square: This is the figure 
consisting of a square and four rectangles in the plane that you get if you 
start with a square of side length z and at each vertex attach a rectangle 
of side-lengths x and y in a rotation-symmetric way, such that the edge of 
length x points away from the square, while the edge of length y runs along 
the side of the square. 


We consider two winged squares “the same” if they are congruent. One 
way to make this unique, such that the representation of the winged square 
depends only on its boundary curve, is to require that the L formed by the 
two edges in the upper right-hand corner is at least as high as it is wide. 
If this condition is not satisfied, then a mirror image (reflected, e.g., in 
a vertical axis), will repair this. So each solution in T’ corresponds to a 
unique winged square of area 4ry + 2? = p, and indeed this is reversable: 
From each winged square we can read off a solution. 


Taking the union of the square and the four rectangles, we get for each 
winged square what we will call a unique winged shape: This is a poly- 
omino of area p with four-fold rotation symmetry, which has twelve ver- 
tices: eight convex ones with inner right angle and four non-convex ones 
with outer right angle. (We can’t get a square shape, since p is a prime, so 
it can’t be a square number.) Again we will consider winged shapes “the 
same” if they are congruent, so we might assume that the L shape in the 
upper right-hand corner is at least as high as it is wide. 


T 


CS 


whew 


On a finite set of odd cardinality, every 


involution has at least one fixed point. 


The winged square of area dry + 2? 


= 73 that corresponds to (x,y,z) = 


(4,3,5), with the L 
lighted ... 


. and its winged shape. 


shape high- 
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Spivak’s proof, for n = 9 and p = 37, where the set T’ of winged squares 
has cardinality 7, while the set W of winged shapes has cardinality 4. 


(7, 1,3) 
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Now we are getting very close to the punch line: For each winged shape 
we get either one or two winged squares, by simultaneously drawing, in a 
rotation-symmetric way, vertical and horizontal lines to the interior starting 
at the non-convex vertices. We get only one solution if the shape has the 
symmetry of a square, that is, if the two arms of the L shapes have the same 
length. This happens exactly if y = z, but then p = 4rz +4 27 = (4x +2)z; 
assuming that p is a prime, this implies that z = 1 and x = n. In other 
words: Exactly one winged shape yields a single winged square, while all 
other winged shapes yield two winged squares each. Consequently, the 
number |T| of winged squares is odd. 

However, the winged squares with non-square rectangles (with x # y) 
come in pairs, as we can always flip the four rectangular wings between 
vertical and horizontal format (that is, exchange x and y). As |T| is odd, 
this implies that there is an odd number of winged squares whose wings are 
squares, that is, JT contains an odd number of triples («, y, z) with a = y, 
and hence at least one, and this yields a solution to (27)? + 2? = p. 


In any representation of p = 4n + 1 as a sum of two squares, one of the 
squares is even, the other one is odd. Thus the involution proof yields 
more than just that p can be written as a sum of two squares: The number 
of these representations in positive integers is odd. (The representation is 
actually unique, see [3].) Also note that the proofs we have presented are 
not effective: Try to find x and y for a ten digit prime! Efficient ways to 
find such representations are discussed in [1] and [8]. 

The following theorem completely answers the question which started this 
chapter. 


Theorem. A natural number n can be represented as a sum of two squares 
if and only if every prime factor of the form p = 4m + 3 appears with an 
even exponent in the prime decomposition of n. 


@ Proof. Call a number n representable if it is a sum of two squares, that 
is, ifn = x? + y? for some x,y € No. The theorem is a consequence of 
the following five facts. 


(1) 1 = 12 + 0? and 2 = 1? + 1? are representable. Every prime of the 
form p = 4m + 1 is representable. 


(2) The product of any two representable numbers ny = «7+ y? and nz = 
Pp y p ivy 
x3 + y3 is representable: nynz = (4122 + y1y2)” + (@1y2 — T2y1)?. 


(3) If n is representable, n = x? + y?, then also nz? 


nz? = (xz)? + (yz)?. 


is representable, by 


Facts (1), (2) and (3) together yield the “if” part of the theorem. 


(4) If p = 4m + 3 is a prime that divides a representable number n = 
x? + y”, then p divides both x and y, and thus p? divides n. In fact, if 
we had w 4 0 (mod p), then we could find % such that c% = 1 (mod p), 
multiply the equation 7? +y? = 0 by z”, and thus we would obtain that 


The second winged square derived from 
the winged shape of area 73 in the mar- 
gin on page 23. It represents the solution 
(6, 3, 1). 
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1+y?z* = 1+(zy)? = 0 (mod p), which is impossible for p = 4m+3 
by Lemma 1. 

(5) If n is representable, and p = 4m + 3 divides n, then p divides n, 
and n/p? is representable. This follows from (4), and completes the 
proof. 


Two remarks close our discussion: 

e Ifaand b are two natural numbers that are relatively prime, then there are 
infinitely many primes of the form am-+b (m € N) — this is a famous 
(and difficult) theorem of Dirichlet. More precisely, one can show that 
the number of primes p < x of the form p = am + b is described very 
accurately for large x by the function aa bea where (a) denotes the 
number of b with 1 < b < a that are relatively prime to a. (This is 
a substantial refinement of the prime number theorem, which we had 


discussed on page 12.) 


e This means that the primes for fixed a and varying b appear essentially 
at the same rate. Nevertheless, for example for a = 4 one can observe a 
rather subtle, but still noticeable and persistent tendency towards “more” 
primes of the form 4m + 3. The difference between the counts of primes 
of the form 4m + 3 and those of the form 4m + 1 changes sign infinitely 
often. Nevertheless, if you look for a large random zx, then chances are 
that there are more primes p < 2 of the form p = 4m + 3 than of the 
form p = 4m-+ 1. This effect is known as “Chebyshev’s bias”; see 
Riesel [4] and Rubinstein and Sarnak [5]. 
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The law of quadratic reciprocity 


Which famous mathematical theorem has been proved most often? Pythago- 
ras would certainly be a good candidate or the fundamental theorem of al- 
gebra, but the champion is without doubt the law of quadratic reciprocity 
in number theory. In an admirable monograph Franz Lemmermeyer lists 
as of the year 2000 no fewer than 196 proofs. Many of them are of course 
only slight variations of others, but the array of different ideas is still im- 
pressive, as is the list of contributors. Carl Friedrich Gauss gave the first 
complete proof in 1801 and followed up with seven more. A little later 
Ferdinand Gotthold Eisenstein added five more — and the ongoing list of 
provers reads like a Who is Who of mathematics. 

With so many proofs present the question which of them belongs in the 
Book can have no easy answer. Is it the shortest, the most unexpected, or 
should one look for the proof that had the greatest potential for general- 
izations to other and deeper reciprocity laws? We have chosen two proofs 
(based on Gauss’ third and sixth proofs), of which the first may be the sim- 
plest and most pleasing, while the other is the starting point for fundamental 
results in more general structures. 


As in the previous chapter we work “modulo p”, where p is an odd prime; 
Z,, is the field of residues upon division by p, and we usually (but not al- 
ways) take these residues as 0,1,...,p— 1. Consider some a # 0 (mod p), 
that is, p { a. We call a a quadratic residue modulo p if a = b? (mod p) for 
some b, and a quadratic nonresidue otherwise. The quadratic residues are 


therefore 17,2?,..., (75+), and so there are ¥>+ quadratic residues and 


Chapter 5 


Check for 


Carl Friedrich Gauss 


For p = 13, the quadratic residues are 


eet quadratic nonresidues. Indeed, if i? = j?(mod p) with 1 < i, 7 < ® <7 
then p|i? — j? = (i-—j)(i +9). AS2 <it+j < p—1 we have pli — j, 
that is, i = j (mod p). 

At this point it is convenient to introduce the so-called Legendre symbol. 
Let a £ 0 (mod p), then 


Q={ ij 


The story begins with Fermat’s “little theorem”: For a 4 0 (mod p), 


if a is a quadratic residue, 
if a is a quadratic nonresidue. 


a?~! = 1(modp). (1) 


In fact, since Z5 = Z, \ {0} is a group with multiplication, the set 
{1a, 2a, 3a,...,(p — 1)a} runs again through all nonzero residues, 


(1a)(2a)---((p— 1a) = 1-2---(p— 1) (modp), 
and hence by dividing by (p — 1)!, we get a?~' = 1(modp). 


© Springer-Verlag GmbH Germany, part of Springer Nature 2018 


1? = 1,0? = 4, 3? = 9, 44 =3; 
P= 12, and C= 10; the nonresidues 
are 2,5,6,7,8, 11. 


Alternatively, this is just al@l = 1 for 
the group G = Z, (see the box on 
Lagrange’s theorem, p. 4). 
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For example, for p = 17 and a = 3 we 
have 3° = (3*)? = 81? = (—4)? = 
—1(mod17), while fora = 2 we get 
28 — (2)? = (-1)? = 1(mod 17). 
Hence 2 is a quadratic residue, while 3 


is a nonresidue. 


Example: (4) = (4) = (3) =-1L 


so 3 is anonresidue mod 17. 


In other words, the polynomial x?~' — 1 € Z,[z] has as roots all nonzero 
residues. Next we note that 


a (2° —1)(2°2 


ae ob 


Suppose a = b? (mod p) is a quadratic residue. Then by Fermat’s little 
theorem a°= = bP-1 = 1 (mod p). Hence the quadratic residues are 
precisely the roots of the first factor 2 = 1, and the pS nonresidues 
must thus be the roots of the second factor 7°= +1. Comparing this to the 


definition of the Legendre symbol, we obtain the following important tool. 


Euler’s criterion. For a £ 0 (mod p), 


() = a’? (modp). 


This gives us at once the important product rule 


(F) = OG 
P pp 
since this obviously holds for the right-hand side of Euler’s criterion. The 
product rule is extremely helpful when one tries to compute Legendre sym- 
bols: Since any integer is a product of +1 and primes we only have to 
compute (Sj) () and ) for odd primes gq. 

By Euler’s criterion (}) = lif p = 1 (mod 4), and (>) =-lifp= 
3 (mod 4), something we have already seen in the previous chapter. The 
case ( 2) will follow from the Lemma of Gauss below: ie =lifp= 
+1 (mod 8), while G) = —1Lif p= +3 (mod 8). 

Euler, Legendre, and Gauss did lots of calculations with quadratic residues 
and, in particular, studied the relations between q being a quadratic residue 
modulo p and p being a quadratic residue modulo g, when p and gq are 
odd primes. Euler and Legendre thus discovered the following remarkable 
theorem, but they managed to prove it only in special cases. However, 
Gauss was successful: On April 8, 1796 he was proud to record in his diary 
the first full proof. 


), (2) 


Law of quadratic reciprocity. Let p and q be different odd primes. 
Then 


(S\(F) = (-1)F >. 


Pq 


If p = 1(mod4) or g = 1(mod4), then 95+ (resp. +) is even, and 
therefore (<1) 2a = 1; thus (5) = (4). When p = q = 3(mod 4), we 
have (4) = —(4). Thus for odd primes we get (4) = (+) unless both p 


and q are congruent to 3 (mod 4). 
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First proof. The key to our first proof (which is Gauss’ third) is a counting 
formula that soon came to be called the Lemma of Gauss: 


Lemma of Gauss. Suppose a # 0(modp). Take the numbers 
lak? cee ., a and reduce them modulo p to the rae system 
cnalied: in absolute value, ia = r; (mod p) with —PS Sih & aS 


for alli. Then 


(=) = (-1)°, where s = #{i: 1; < O}. 
Pp 


@ Proof. Suppose uj ,...,u, are the residues smaller than 0, and that 
U1,+++)Upa1_, are those greater than 0. Then the numbers —w1,...,—Us 
are between 1 and a and are all different from the v;s (see the margin); If —u,; = v;, then u; + v; = 0 (mod p). 
hence {—u1,...,—Us,U1,--- :Up=1_g} StL 2 ocwe poh. Therefore Now u; = ka, vj = €a (mod p) implies 
p|(k + £)a. As p and a are relatively 
Ils —Uj HT te = poy I, prime, p must divide k + £ which is im- 


possible, since k + €< p—1. 


which implies 
—1)* [[« II vj = 7! (modp). 
a j 


Now remember how we obtained the numbers u; and vj; they are the 


. 1 
residues of la,--- , ?5-a. Hence 


-1)* [JuJu = (2 A!a*> (modp). 
z Jj 


Cancelling ety together with Euler’s criterion gives 


p-l) 
a! 


a p-1 


(-) =a? = (-1)° (modp), 
Pp 


and therefore e = (—1)°, since p is odd. 


With this we can easily compute CG ): Since 1-2,2-2,..., pot - 2 are all 
between 1 and p — 1, we have 
= #{i: Po <2i<p—1l} = Po Hi: 2a<* pot} = [+]. 


Check that s is even precisely for p = 8k + 1. 


The Lemma of Gauss is the basis for many of the published proofs of the 
quadratic reciprocity law. The most elegant may be the one suggested 
by Ferdinand Gotthold Eisenstein, who had learned number theory from 
Gauss’ famous Disquisitiones Arithmeticae and made important contribu- 
tions to “higher reciprocity theorems” before his premature death at age 29. 

His proof is just counting lattice points! OP 
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p=17 q=1l 


s=5 f=3 
=(-1)? =-1 
= (-1)? =-1 


Let p and q be odd primes, and consider Cy: Suppose zg is a multiple of 

q that reduces to a negative residue r; < 0 in the Lemma of Gauss. This 

means that there is a unique integer j such that —5 < iq — jp < 0. Note 

that 0 < j < § since 0 <i < §. In other words, (7) = (—1)°, where s is 

the number of lattice points (, y), that is, pairs of integers x, y satisfying 
qd 


0<py—qe<5,0<@<5,0<y<9. (3) 


Similarly, (4) = (—1)! where t is the number of lattice points (x, y) with 


0<qx—py<5,0<@<5,0<y<9. (4) 


Now look at the rectangle with side lengths 5, $, and draw the two lines 


parallel to the diagonal py = qu, y = 5x + 5 or py— qu = §, respectively, 
y = F(a — 5) or qu — py = §. 
The figure shows the situation for p = 17, q = 11. 


1 
2 


Sie 


The proof is now quickly completed by the following three observations: 


1. There are no lattice points on the diagonal and the two parallels. This 
is so because py = qa would imply p|x, which cannot be. For the 
parallels observe that py — qx is an integer while 5 and $ are not. 


2. The lattice points observing (3) are precisely the points in the upper 
strip 0 < py — qx < 4, and those of (4) the points in the lower strip 
O<qr-—py< a Hence the number of lattice points in the two strips 


iss+t. 


3. The outer regions R : py — qx > . and S : qx — py > 2 contain the 
same number of points. To see this consider the map y : R + S which 


maps (x, y) to (24+ — a, 44* — y) and check that ¢ is an involution. 
q-1 


Since the total number of lattice points in the rectangle is po - “37, we 


infer that s + ¢ and po . a have the same parity, and so 


4) (PB) (yystt = (ye 
Co fer? = (1) : 
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Second proof. Our second choice does not use Gauss’ lemma, instead it 
employs so-called “Gauss sums” in finite fields. Gauss invented them in his 
study of the equation x? — 1 = 0 and the arithmetical properties of the field 
Q(¢) (called cyclotomic field), where ¢ is a p-th root of unity. They have 
been the starting point for the search for higher reciprocity laws in general 
number fields. 

Let us first collect a few facts about finite fields. 

A. Let p and q be different odd primes, and consider the finite field F’ with 
q?—' elements. Its prime field is Z,, whence ga = 0 for any a € F’.. This 
implies that (a + 6)? = a? + b%, since any binomial coefficient (7) is a 
multiple of g for 0 < i < q, and thus 0 in F’. Note that Euler’s criterion is 
an equation (7) = p= in the prime field Zq- 

B. The multiplicative group F* = F \ {0} is cyclic of size qg?-' — 1 
(see the box on the next page). Since by Fermat’s little theorem p is a 
divisor of q?-t — 1, there exists an element ¢ € F of order p, that is, 
cP = 1, and ¢ generates the subgroup {¢,¢?,...,¢? = 1} of F*. Note 
that any ¢’ (i # p) is again a generator. Hence we obtain the polynomial 
decomposition x? — 1 = (a — ¢)(a — ¢?)--- (a — ¢P). 

Now we can go to work. Consider the Gauss sum 


p-1 


C= ~G¢ e F, 
i=1 


where ip) is the Legendre symbol. For the proof we derive two different 
expressions for G and then set them equal. 
First expression. We have 


p-1 . p-l . p-l1. 


i=l i=l 


where the first equality follows from (a + b)4 = a4 + 0, the second uses 
that Cy = () since q is odd, the third one is derived from (2), which 
yields (4) = Qe) and the last one holds since ig runs with i through 
all nonzero residues modulo p. 
Second expression. Suppose we can prove 

G? =(-1)"Fp, (6) 
then we are quickly done. Indeed, 

p-lq—-1 q-1 p p—-1q-1 


Gate eae spr eG \oyerr. 


Equating the expressions in (5) and (7) and cancelling G,, which is nonzero 
by (6), we find (2) = (2)(—1) "=" “=~, and thus 


Example: Take p = 3, q = 5. Then 

G = ¢-C? and G = (°-¢)° = C7-¢ 
= —(¢ — ¢?) = —G, corresponding to 
f= @)=—-1 
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The multiplicative group of a finite field is cyclic 
Let F* be the multiplicative group of the field F, with |F*| = n. 
Writing ord(a) for the order of an element, that is, the smallest pos- 
itive integer k such that a* = 1, we want to find an element a € F* 
with ord(a) = n. If ord(b) = d, then by Lagrange’s theorem, d 
divides n (see the margin on page 4). Classifying the elements ac- 
cording to their order, we have 


n= 5 wv(d), where y(d) = #{b € F* : ord(b) = d}. (8) 
d| 


If ord(b) = d, then every element b' (i = 1,..., d) satisfies (b’)¢ = 1 
and is therefore a root of the polynomial x“ — 1. But, since F is a 


field, e¢— 1 has at most d roots, and so the elements b, b?,...,b¢ = 1 
are precisely these roots. In particular, every element of order d is of 
the form b'. 

On the other hand, it is easily checked that ord(b') = a where 


(i,d) denotes the greatest common divisor of 7 and d. Hence 
ord(b’) = d if and only if (i,d) = 1, that is, if ¢ and d are rela- 
tively prime. Denoting Euler’s function by p(d) = #{i:1<i< 
d, (i,d) = 1}, we thus have ~(d) = y(d) whenever ~(d) > 0. 
Looking at (8) we find 


fo) A n=) ¥(d)< > 9(d). 
Keo . Sf d|n 


fa d|n 


But, as we are going to show that 


ee) Se (9) 


d|n 


we must have 7(d) = y(d) for all d. In particular, (n) = y(n) > 1, 
and so there is an element of order n. 


The following (folklore) proof of (9) belongs in the Book as well. 
“Even in total chaos Consider the n fractions 
we can hang on 


. ” il 2 k n 
to the cyclic group ah pO ORD NO 809 mo 


reduce them to lowest terms & = 4 with 1 <i<d,(i,d) =1,d|n, 


and check that the denominator d appears precisely yp(d) times. 


It remains to verify (6), and for this we first make two simple observations: 


© >-?_, Ci =O and thus 7?7} ¢# = -1. Just note that — }>?_, ¢’ is the 
coefficient of x?! in x? — 1 =]]?_, (x — ¢°), and thus 0. 

° Sa) = 0 and thus wii (&) = =(> since there are equally 
many quadratic residues and nonresidues. 
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We have 
2 = Uy Ag = J Fi] 
e’=(VAA(SA¢) =-vecr 
a 2 rae = 
Setting 7 = ik (mod p) we find 
ky, SS a . 
@ = DA gow = FH Somme 
i,k k=1 i=1 


For k = p—1 = —1(modp) this gives F)@-1), since ¢1+* = 1. Move 
k = p— 1 in front and write 


= a ee 
GC? = ( )(p 1) | ( Lycee 
2 k=1 = 


Since ¢!+* is a generator of the group for k 4 p — 1, the inner sum equals 
Re ¢? = —1 for all k 4 p—1 by our first observation. Hence the second 
summand is wea (¥) = (= by our second observation. It follows 
that G? = (= )p and thus with Euler’s criterion G? = (—1)°2~p, which 
completes the proof. 
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The law of quadratic reciprocity 


“I’m pushing 196 proofs 
for quadratic reciprocity” 


“What's up?” 


Every finite division ring is a field 


Rings are important structures in modern algebra. If a ring R has a mul- 
tiplicative unit element 1 and every nonzero element has a multiplicative 
inverse, then Ff is called a division ring. So, all that is missing in R from 
being a field is the commutativity of multiplication. The best-known exam- 
ple of a noncommutative division ring is the ring of quaternions discovered 
by Hamilton. But, as the chapter title says, every such division ring must of 
necessity be infinite. If R is finite, then the axioms force the multiplication 
to be commutative. 


This result which is now a classic has caught the imagination of many math- 
ematicians, because, as Herstein writes: “It is so unexpectedly interrelating 
two seemingly unrelated things, the number of elements in a certain alge- 
braic system and the multiplication of that system.” 


Theorem. Every finite division ring is commutative. 


This beautiful theorem which is usually attributed to MacLagan Wedder- 
burn has been proved by many people using a variety of different ideas. 
Wedderburn himself gave three proofs in 1905, and another proof was given 
by Leonard E. Dickson in the same year. More proofs were later given by 
Emil Artin, Hans Zassenhaus, Nicolas Bourbaki, and many others. One 
proof stands out for its simplicity and elegance. It was found by Ernst Witt 
in 1931 and combines two elementary ideas towards a glorious finish. 


@ Proof. Our first ingredient comes from a blend of linear algebra and 
basic group theory. For an arbitrary element s € R, let C', be the set 
{x € R: xs = sx} of elements which commute with s; C, is called the 
centralizer of s. Clearly, C’; contains 0 and | and is a sub-division ring 
of R. The center Z is the set of elements which commute with all elements 
of R, thus Z = ‘ar ER C’,. In particular, all elements of Z commute, 0 and 1 
are in Z, and so Z is a finite field. Let us set |Z| = q. 


We can regard R and C’, as vector spaces over the field Z and deduce that 
|R| = q”, where n is the dimension of the vector space R over Z, and 
similarly |C,| = q”* for suitable integers n, > 1. 


Now let us assume that FR is not a field. This means that for some s € R 
the centralizer C’, is not all of R, or, what is the same, n, < n. 


On the set R* := R\{0} we consider the relation 


wr is =a !'rx forsome x € R*. 
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Every finite division ring is a field 


It is easy to check that ~ is an equivalence relation. Let 
A, = {a 'sx: 2 € R*} 


be the equivalence class containing s. We note that |A,| = 1 precisely 
when s is in the center Z. So by our assumption, there are classes A, with 
|A,| > 2. Consider now for s € R* the map f, : x ++ x 'sx from R* 
onto A,. For x,y € R* we find 

x ise=y'sy <=> (yr ')s=s(yr') 


=> yo eCt => ye Crs, 


for CS := C,\{0}, where C3a = {zx : z € C3} has size |C*|. Hence any 


element z's is the image of precisely |C*| = g”* — 1 elements in R* 
under the map f,, and we deduce |R*| = |A;||C2|. In particular, we note 
that 

|R*| _— g®—1 


Ce] age |A.| is an integer for all s. 

We know that the equivalence classes partition R*. We now group the 
central elements Z* together and denote by A;,..., A; the equivalence 
classes containing more than one element. By our assumption we know 
t > 1. Since |R*| = |Z*| + 4 |A;,|, we have proved the so-called 
class formula 


t 

n qed 

Pad gate), aaa (1) 
hai 


where we have 1 < an <7 € N for all k. 
With (1) we have left abstract algebra and are back to the natural numbers. 
Next we claim that g”* —1 | q”—1 implies n, | n. Indeed, writen = an, +r 


with 0 <r < ng, then g”* — 1| q*”*" — 1 implies 
q”* —1| (gore t" — 1) — (gh — 1) = g™ (qVF — 1), 


and thus g”* — 1| g’t—Uretr — 1, since q”* and q”* — 1 are relatively 
prime. Continuing in this way we find g’* — 1|q" — 1 withO <r < ng, 
which is only possible for r = 0, that is, n, |. In summary, we note 


ny |n forall k. (2) 


Now comes the second ingredient: the complex numbers C. Consider the 
polynomial «” — 1. Its roots in C are called the n-th roots of unity. Since 
X” = 1, all these roots \ have |A| = 1 and lie therefore on the unit circle of 
the complex plane. In fact, they are precisely the numbers A, = en = 
cos(2km/n) + isin(2ka/n), 0 < k < n— 1 (see the box on the next 
page). Some of the roots \ satisfy AY = 1 for d < n; for example, the 
root \ = —1 satisfies \2 = 1. For a root A, let d be the smallest positive 
exponent with \? = 1, that is, d is the order of \ in the group of the roots 
of unity. Then d|n, by Lagrange’s theorem (“the order of every element of 
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a group divides the order of the group” — see the box in Chapter 1). Note 
that there are roots of order n, such as Ay = en. 
z=re? 
Roots of unity 
Any complex number z = x + iy may be written in the “polar” form 
; y=rsiny 
z= re?’ = r(cosy+isiny), 
where r = |z| = \/x? + y? is the distance of z to the origin, and ¢ is = 
the angle measured from the positive x-axis. The n-th roots of unity wv =P Cos (p 
are therefore of the form 
Ay =e" =cos(2kr/n) + isin(2km/n), 0<k<n-l, A 
Az AV =¢ 
since for all k 
NP = e7** — cos(2km) + isin(2km) = 1. 4 1 


We obtain these roots geometrically by inscribing a regular n-gon 
into the unit circle. Note that Ax, = ¢ K for all k, where — e= . Thus 
the n-th roots of unity form a cyclic group {¢,¢7,...,¢"-1,¢" = 1} 
of order n. 


Now we group all roots of order d together and set 


gaz) = |] (@-)). 


A of order d 


Note that the definition of @g(x) is independent of n. Since every root has 
some order d, we conclude that 


2-1 = [] da(z). (3) 
d| 


Here is the crucial observation: The coefficients of the polynomials ¢,,(x) 
are integers (that is, (a) € Z[x] for all n), where in addition the constant 
coefficient is either 1 or —1. 
Let us carefully verify this claim. For n = 1 we have 1 as the only root, 
and so ¢)(a”) = x — 1. Now we proceed by induction, where we assume 
¢a(x) € Z]a] for all d < n, and that the constant coefficient of da(x) is 1 
or —1. By (3), 

2" —1 = p(x) bn(z) (4) 


L : n—t 
where p(x) = >> pjx4, bn(z) = > agx®, with po = 1 or pp = —1. 
j=0 k=0 


Since —1 = podo, we see ap € {1,—1}. Suppose we already know that 
do,@1,---,@¢—1 € Z. Computing the coefficient of x* on both sides of (4) 


The roots of unity for n = 6 
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la—pnl >l¢q-1 


we find 
k k 
Spiny = SS ais + pode € Z. 
j=0 j=l 
By assumption, all ag,...,@,—1 (and all p;) are in Z. Thus poa, and hence 


ax must also be integers, since po is 1 or —1. 


We are ready for the coup de grace. Let nz |n be one of the numbers 
appearing in (1). Then 


o"—1= [[ dale) = @™*- Ione) J] dala). 
d|n d|n,d{nz, dxn 
We conclude that in Z we have the divisibility relations 


good 
qre — 1° 


dn(g)|q"—-1 and gn(q) 
Since (5) holds for all k, we deduce from the class formula (1) 


on(q)|q—1, 
but this cannot be. Why? We know ¢,,(a) = [](# — A) where A runs 


through all roots of 7” — 1 of order n. Let X = a+ ib be one of those roots. 
By n > 1 (because of R 4 Z) we have X # 1, which implies that the real 
part a is smaller than 1. Now |A|? = a? + b? = 1, and hence 


Ja-AP? = \g-a-ab)? = (qQ-a?+h? 
= gq —2ag+a?+0? = qg?-2aq4+1 
Ss oy De +1 (because of a < 1) 
= (q-1?, 
and so |q — \| > q — 1 holds for all roots of order n. This implies 


lén(a)| = [J la-Al >@-1, 
Xr 


which means that ¢,,(q) cannot be a divisor of g — 1, contradiction and end 
of proof. 
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The spectral theorem and 
Hadamard’s determinant problem 


A fundamental theorem of linear algebra asserts that every symmetric real 
matrix A can be diagonalized. That is, for every such matrix A there is a 
nonsingular real matrix @ such that 


AL 


QQ = =, 


is in diagonal form. The (real) \;’s are the eigenvalues of A, and the 
columns of @ form a basis of eigenvectors. We will make use of this result 
in several chapters to come. 

What’s more, the matrix @ can be chosen to be an orthogonal matrix, which 
means that Q? = Q7!, or equivalently that the columns of Q form an 
orthonormal basis with respect to the usual inner product. 


Theorem 1. For every real symmetric matrix A there is a real 
orthogonal matrix Q such that QTAQ is diagonal. 


Moving Q and Q? to the right-hand side we may equivalently express the 
theorem as a representation of A as a linear combination of matrices P; that 
correspond to projections onto the eigenspaces C),, 


A = MP+ +) +, 
In = Pi chee. ee Fe lr: Ee; 


with P;P; = 6;;P; for all i and j. In this form the statement is usually 
called the spectral theorem. 

The standard proofs of the theorem proceed by induction on the order of A 
(with some care in the presence of multiple eigenvalues), construct the basis 
of eigenvectors step by step, and use the fact that the characteristic poly- 
nomial splits into linear factors over the field C of complex numbers. 
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The spectral theorem and Hadamard’s determinant problem 


The Heine—Borel theorem 
Every closed and bounded subset of a 
vector space R™ is compact. 


The following proof due to Herb Wilf does it in one stroke and is truly 
inspired. It is very different from the usual proofs: It does not even refer to 
the eigenvalues, but instead employs an elegant compactness argument in a 
surprising way. 


@ Proof. We start with some preliminary facts. Let O(n) C R"*” be the 
set of real orthogonal matrices of order n. Since 


PO ao Pt =a OP =o 


for P,Q € O(n), we see that the set O(n) is a group. Regarding any matrix 
in R"*” as a vector in R™, we find that O(n) is a compact set. Indeed, as 
the columns of an orthogonal matrix Q = (q;;) are unit vectors, we have 
\qij| < 1 for all ¢ and J, thus O(n) is bounded. Furthermore, the set O(n) 
is defined as a subset of R"” by the equations 


LiALi1 + LjQXjQ ++ +++ Lin®jn = Oi; forl <i,7g <n, 


hence it is closed, and thus compact. 


For any real square matrix A let Od(A) = )7,7, a;, be the sum of the 
squares of the off-diagonal entries. Suppose we can prove the following. 


Lemma. /f A is a real symmetric n x n matrix that is not diagonal, that is, 
Od(A) > 0, then there exists U € O(n) such that Od(UTAU) < Od(A). 


Given the lemma, the theorem follows in three quick steps. Let A be a real 
symmetric n xX n matrix. 

(A) Consider the map f4 : O(n) + R"*” with f4(P) := PTAP. The map 
fa is continuous on the compact set O(n), and so the image f'4(O(n)) is 
compact. 

(B) The function Od : f4(O(n)) — R is continuous, hence it assumes a 
minimum, say at D = QTAQ € fa(O(n)). 

(C) The value Od(D) must be zero, and hence D is a diagonal matrix as 
required. 

Indeed, if Od(D) > 0, then applying the Lemma we find U € O(n) with 
Od(U7DU) < Od(D). But 


UTDU = UTQ™AQU = (QU)TA(QU) 


is in f4(O(n)) (temember O(n) is a group!) with Od-value smaller than 
that of D — contradiction, and end of proof. 

It remains to prove the lemma, and for this we use a very clever method 
attributed to Carl Gustav Jacob Jacobi. Suppose that a,, 4 0 for some 
r # s. Then we claim that the matrix U that agrees with the identity matrix 
except that u,, = Us, = cos¥, uy, = sinV, Us, = —sinV does the job, 
for some choice of the (real) angle J: 
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. Ss 
1 
“al 
cos Uv sind r 
1 
U = 5, 
i 
—sind cos 0 8 


“Diagonalizing by applying a rotation 
Clearly, U is orthogonal for any V. and removing off-diagonal elements” 
Now let us compute the (k, ¢)-entry bj, of UTAU. We have 


bee = > wan caguye. d) 
a,j 


For k, é ¢ {r, s} we get bye = axe. Furthermore, we have 
ber = > Uik > QijUjr 
i=1 j=l 
a s Uik (Gir COSY — ays sin V) 
i=1 
= Ap, cos¥ —agssinvd (fork £1,s). 
Similarly, one computes 
bes = Qgrsind+az,cos0 (fork #1,s). 
It follows that 


be. to, = az, cos? 0 — 2apr aps CosV sin Y + az, sin? 0 


+ Wey sin? 0 + 2ap-Gps sin 9 cos 0 + ax, cos? 3 


= diy + Gs, 
and by symmetry 
b2, +62, =a2,+a2, (foré#r,s). 


We conclude that the function Od, which sums the squares of the off- 
diagonal values, agrees for A and U7AU except for the entries at (r,s) 
and (s,1), for any V. To conclude the proof we now show that Jo can be 
chosen suitably as to make b,., = 0, which will result in 


Od(U7AU) = Od(A) — 2a?, < Od(A) 


as required. 
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Jacques Hadamard 


Using (1) we find 
brs = (Apr — Ass) Sin 9 cos 0 + ays(cos? 0 — sin? V9). 


For ¥ = 0 this becomes a;,, while for’ = 5 it is —a;s. Hence by the 


intermediate value theorem there is some Jo between 0 and 5 such that 
b;s = 0, and we are through. 


So this was beautiful, and we want to immediately apply the theorem to a 
famous (and unsolved) problem. 


The Hadamard determinant problem 


How large can det A be on the set of all real n x n matrices 
A = (aj;;) with |a;;| < 1 for all i and j? 


Since the determinant is a continuous function in the a;; (considered as 
variables) and the matrices form a compact set in R™, this maximum must 
exist. Furthermore, the maximum is attained for some matrix all of whose 
entries are +1 or —1, because the function det A is linear in each single 
entry a;; (if we keep all other entries fixed). Thus we can start with any 
matrix A and move one entry after the other to +1 or to —1, in every single 
step not decreasing the determinant, until we arrive at a +1-matrix. In the 
search for the largest determinant we may thus assume that all entries of A 
are 1. 

Here is the trick: Instead of A we consider the matrix B = ATA = (b;;). 
That is, if c; = (@1;,@9;,..., deg” denotes the j-th column vector of A, 
then b;; = (c;, c;), the inner product of c; and c;. In particular, 


bis = (ci, 4) =n for all 3, 
and 
trace B = > by =n’, (2) 
i=l 
which will come in handy in a moment. 


Now we can go to work. First of all, from B = A7A we get | det A| = 
Vdet B. Since multiplication of a column of A by —1 turns det A into 
—det A, we see that the maximum problem for det A is the same as for 
det B. Furthermore, we may assume that A is nonsingular, and hence that 
B is nonsingular as well. 

Since B = A‘A is a symmetric matrix the spectral theorem tells us that for 
some Q € O(n), 


Ay 
Q™BQ = QTATAQ=(4Q)"(4Q)=] 9. |, 
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where the A; are the eigenvalues of B. Now, if d; denotes the j-th column 
vector of AQ (which is nonzero since A is nonsingular), then 


Thus ,,...,A,, are positive real numbers and 
det B = A, --- Ay, trace B = bBo 
i=1 


Whenever such a product and sum of positive numbers turn up, it is always a 
good idea to try the arithmetic-geometric mean inequality (see Chapter 20). 
In our case this gives with (2) 


det B=y---n < (= ay _ Gey =n”, (4) 


nm nr 


and out comes Hadamard’s upper bound 
|det A] < n®/?, (5) 


When do we have equality in (5) or, what is the same, in (4)? Easy enough: 
if and only if the geometric mean of the ;’s equals the arithmetic mean, or 
equivalently, if and only if Ay = --- = A, = A. But then trace B = nA = 
n?, and so Ay; =--- = A, = n. Looking at (3) this means Q7BQ = nI,, 
where J, is the n x n identity matrix. Now recall Q7 = Q7!, multiply 
by Q on the left, by Q~+ on the right, to obtain 


B=ntIp. 
Going back to A this means that 


| det A] = n"/? <=> (c,c;) =0 for i #3. (6) 


Matrices A with +1-entries that achieve equality in (5) are aptly called 
Hadamard matrices. So ann x n matrix A with +1-entries is a Hadamard 
matrix if and only if 


ATA = AAT = nn. 


This leads to another unsolved and apparently very difficult problem: 


For which n does a Hadamard matrix of size n * n exist? 


A short argument shows that if n is greater than 2, then it must be a mul- 
tiple of 4. Indeed, suppose that A is an n x nm Hadamard matrix, n > 2, 
whose rows are the vectors 71,...,7p. Clearly, multiplication of any row 
or column by —1 gives another Hadamard matrix. So we may assume that 
the first row consists of 1’s only. Since (r1,7;) = 0 for i 4 1, every other 


Statements (5) and (6) form an instance 
of Hadamard’s inequality: The absolute 
value of the determinant of a matrix is 
at most the product of the lengths of its 
columns, with equality if and only if the 
columns are pairwise orthogonal. 
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For n = 4, with the numbering Ci) =2, 
C2 = {1}, C3 = {2}, Ca = {1, 2} this 
yields the matrix 


1 1 1 1 
te = 1 = 
1 a Ss 
1-1 1 1 


1 1 
—1 


1 1 1 
) 1 eel 1 
Tl =1 =! 


1 

1 1 1 1 
J. = 1 -1 
1 1 =. = 
je | 1 


Optimal matrices for n = 2,3, and 4, 
with determinants 2, 4, and 16. 


row must contain 7 1’s and 7 —1’s; in particular, n must be even. As- 


sume now that n > 2 and consider rows rz and r3, and denote by a, b,c, d 
the numbers of columns that have cs ae rea and = in rows 2 and 3, 


respectively. Then from (71,72) = 0 and (1,73) = 0 we get 


a+b=c+d=a+c 


which gives b = c, a = d. But from (r2,73) = 0 we also have a+d = b+c, 
resulting in 2a = 2b. We conclude that a = b = c= d = *. Thus the order 


ae 
of the Hadamard matrix is either n 2,orn =a+b+c+d = 4a, 
a multiple of 4. 


lorn 


Does a Hadamard matrix exist for all n = 4a? No one knows. The answer 
is yes for n up to the current record n = 664, and for certain infinite series 
such as the powers of 2 (see the box). But the general answer seems at 
present out of reach. 


Hadamard matrices exist for all n = 2™ 


Consider an m-set X and index the 2” subsets C C X in any way 
C),...,Cam. The matrix A = (a;,;) is defined as 


Cig = (eke 
We want to verify (r;,7;) = 0 fori # j. From the definition, 


(age) es Ge oa eae. (*) 
k 

Now, as C; # C; there exists an element a € X with a € C;\C; 
or a € C;\C;; suppose a € C;\C;. Half the subsets of X contain 
a, and half do not. Let C' run through all subsets that contain a, then 
the pairs {C, C\a} will comprise all subsets of X. But for each such 
pair {C, C\a}, |C;NC|+|C; NC and |C; (C\a)| +|C; 9 (C\a)| 
have different parity, and so the corresponding terms in (*) will sum 
to 0. But then the whole sum is 0, as required. 


For n = 4a we have thus reduced the original problem to the existence of 
Hadamard matrices. But how large can det A be when n is not a multiple 
of 4? This is again a hard problem, but maybe we can find a good lower 
bound for the maximum. Here is a method that often proves successful — 
and it does in our case. 


Let us look at all 2”” matrices with +1-entries and consider some averages 
of the determinant. The arithmetic mean a > det A is 0 (clear?), so 
this is no big help. But if we consider the mean square average instead, 


ee 
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then things brighten up. Clearly, 


max det A > Dp, 


so this will give us a lower bound for the maximum. 


The following stunningly simple calculation of D,? probably appeared first 
in an article by George Szekeres and Paul Turan. We learnt it from a 
beautiful paper of Herb Wilf who heard it from Mark Kac. In the words 
of Mark Kac: “Just write (det A)? out twice, interchange summation, and 
everything simplifies.” So we want to do just that. 


From the definition of the determinant we get 


1 . : 
DZ = sat ( Ylsisn 7) are 1ya2n(2) “++ dnn(n)) 


A wT 
1 : : 
~ on? a ps S “(sign o) (sign T)@15(1)@17r(1) *** @no(n)@nr(n)> 
A op fT 


where o and 7 run independently through all permutations of {1,...,n}. 
Interchange of summation yields 


1 ? : 
De = n= S-(sign c) (sign T) ( bo Q1o(1)417(1) °° ° Ano(n)@nr(n) ) 4 
oF A 


This doesn’t look too promising, but wait. Look at a fixed pair (o, 7). The 
inner sum > 4 is really a summation over n? variables, one for each a; re 


> y eek Ss Q16(1)@17(1) *** Ano(n)@nr(n): (7) 
i 


ai1=t1laig=+1 Ann=t 


Suppose o(7) = k # r(i). Then every summand contains aj, and there- 
fore the whole sum has the factor eee, diz = O, and hence is 0 as well. 
The only way that the sum fails to be 0 is when o = 7, and everything sim- 
plifies indeed: For o = 7, the inner product is 1 as is the term (signc)?. 
The sum in (7) is therefore 


s ee s- iSo, 
1 


ayy=+1 tana 


and wrapping things up we obtain 
2 i n? ! 
Dy, = an? y 2” =n, 


and thus the following result. 


Theorem 2. There exists an n X n matrix with entries +1 whose 
determinant is greater than Vn!. 
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The spectral theorem and Hadamard’s determinant problem 


It is a characteristic feature of averaging that, while we learn that such a 
matrix exists, we have no clue how to construct it efficiently. But, surpris- 
ingly, the bound is quite good. Invoking Stirling’s formula from page 13 
we have 


Val ~ (2mn)?(2)*, 


€ 


and this is not too bad in comparison to the upper bound n”/?. 
Using the biquadratic mean average Szekeres and Turan got the even better 


lower bound tv n!,/n, but the correct growth for the maximum as n goes 
to infinity is still not known. 
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“or ts irrational” 


This was already conjectured by Aristotle, when he claimed that diameter 
and circumference of a circle are not commensurable. The first proof of 
this fundamental fact was given by Johann Heinrich Lambert in 1766. In 
fact, Lambert even showed that tan r is irrational for rational r 4 0; the 
irrationality of 7 follows from this since tan | = 1. Our Book Proof is 
due to Ivan Niven, 1947: an extremely elegant one-page proof that needs 
only elementary calculus. Its idea is powerful, and quite a bit more can be 
derived from it, as was shown by Iwamoto and Koksma, respectively: 


e 7° is irrational and 


e e” is irrational for rational r £ 0. 


Niven’s method does, however, have its roots and predecessors: It can be 
traced back to the classical paper by Charles Hermite from 1873 which 
first established that e is transcendental, that is, that e is not a zero of a 
polynomial with rational coefficients. 


Before we treat 7 we will look at e and its powers, and see that these are 


Chapter 8 


Charles Hermite 


Check for 
updates 


1++4 
irrational. This is much easier, and we thus also follow the historical order = 2.7182 
in the development of the results. 


Be po {. & 
To start with, it is rather easy to see (as did Fourier in 1815) that e = aa - 
ee = is irrational. Indeed, if we had e = $ for integers a and b > 0, = a 
then we would get k>0 


nibe = nia 


for every n > 0. But this cannot be true, because on the right-hand side we 
have an integer, while the left-hand side with 


= (145454 +=) +( ! tr : : ) 
oS ae a nl! (n+)! (n+2)! (n+3)! 
decomposes into an integral part 
1 1 1 
! | He eiceeke 
nile ota te +5) 


and a second part 


ih 1 1 
an (n+ )(n+2)) (D+ I(n+2)(n +3) - a 
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Geometric series 
For the infinite geometric series 
ee i i or 
Qa et 
with q > 1 we clearly have 


1 1 
qQ =1+-+ ,+---=14+Q 
a @ 
and thus 
q-1 
1g2 JOURNAL DE MATHEMATIQUES 


SUR LIRRATIONNALITE DU NOMBRE 
e= 2,718... 
Par J. LIOUVILLE 


On prouve dans les éléments que le nombre e, base des logarithmes 


népériens, n'a pas une 


aleur rationnelle. On devrait, ce me semble, 


ajouter que la méme méthode prouve aussi que peut pas étre ra- 
cine d'une équation du second degré & coefficients rationnels, en sorte 
que l'on ne peut pas avoir ae +- : =e, a etant un entier positifet 4, c, 
des entiers positifs ou négatifs. En effet, si l'on remplace dans cette 
équation ¢ et > ou e~* par leurs développements déduits de celui de e*, 
puis qu'on multiplie les deux membres par 1.2.3. ..7, on trouvera 


aisément 


« 1 6 ' 
wei (' tapet--:) = agi (' rage: ky 


# étant un entier. On peut toujours faire en sorte que le facteur 


soit positif; il suffira de supposer » pair si 6 est <o et n impair si b est 
>0; en prenant de plus » trés grand, l’équation que nous venons 
d'écrire conduira dés lors & une absurdité; car son premier membre 
étant essentiellement positif et trés petit, sera compris entre o et 1, 
et ne pourra pas étre égal & un entier 4. Donec, etc 


Liouville’s paper 


which is approximately — so that for large n it certainly cannot be integral: 
It is larger than and smaller than 2 as one can see from a comparison 


with a geometric series: 


1 1 1 1 
< 
n+1 nmt+1  (n+1)(n+2)  (n4+1)(n+4 2)(n +3) 
2a bate a ee 
n+1  (n+1)? (n+1)3 on 


Now one might be led to think that this simple multiply—by—n! trick is not 
even sufficient to show that e? is irrational. This is a stronger statement: 
\/2 is an example of a number which is irrational, but whose square is not. 


From John Cosgrave we have learned that with two nice ideas/observations 
(Jet’s call them “tricks’””) one can get two steps further nevertheless: Each of 
the tricks is sufficient to show that e? is irrational, the combination of both 
of them even yields the same for e*. The first trick may be found in a one 
page paper by J. Liouville from 1840 — and the second one in a two page 
“addendum” which Liouville published on the next two journal pages. 


Why is e? irrational? What can we derive from e? = $2 According to 


Liouville we should write this as 


substitute the series 


i titetetaa 0 

as 11 1 4 
ahs Gees es ae bts 

1°2 6°24 120 


and then multiply by n!, for a sufficiently large even n. Then we see that 
nibe is nearly integral: 


1 1 
! | | Lois 
nlb(1 + T q T + ) 


is an integer, and the rest 


(oot want) 


_b_ 


but smaller than cs as we have 
n+1 n 


is approximately o, It is larger than 
seen above. 

At the same time n!ae~! is nearly integral as well: Again we get a large 
integral part, and then a rest 


1 1 I. 
(n +1)! CeCe Mae 


(-1)"*nla( 
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. . . +1 a . . 
and this is approximately (—1)"** ©. More precisely: for even n the rest is 


larger than —#, but smaller than 


(a wap a i) aa) ee 


But this cannot be true, since for large even n it would imply that n!ae~! is 
just a bit smaller than an integer, while n!be is a bit larger than an integer, 
so nlae~! = n!be cannot hold. 


In order to show that e* is irrational, we now courageously assume that 


= - were rational, and write this as 


We could now try to multiply this by n! for some large n, and collect the 
non-integral summands, but this leads to nothing useful: The sum of the 


remaining terms on the left-hand side will be approximately a, on the 
right side (<1 192 and both will be very large if n gets large. 


Ane 
So one has to examine the situation a bit more carefully, and make two little 
adjustments to the strategy: First we will not take an arbitrary large n, but 
a large power of two, n = 2”; and secondly we will not multiply by 7!, 
but by sity. Then we need a little lemma, a special case of Legendre’s 
theorem (see page 10): For any n > 1 the integer mn! contains the prime 
factor 2 at most n — 1 times — with equality if (and only if) n is a power 
of two, n = 2”. 


This lemma is not hard to show: | 4 | of the factors of n! are even, || of 
them are divisible by 4, and so on. So if 2” is the largest power of two 
which satisfies 2" < n, then n! contains the prime factor 2 exactly 


el+g]+--+lg] s G+ g++ = 2(l-g) <1 
a1 14 al = ga ge ee ey 


times, with equality in both inequalities exactly if n = 2”. 


Let’s get back to be? = ae~?. We are looking at 


| ! 
nm! 5 mi 5 


baie = a5,-1€ (1) 
and substitute the series 
. io 
e = = = — . — 
1 2 6 r! 
ae 2 4 8 2" 
=>) rT 
= 1--+=--++ —1)"— 
© i? 3. G TA aa 


For r < n we get integral summands on both sides, namely 


nl) 2° Wr n! 2° 
9n-1 pT resp. (— ) Aon pl? 
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The estimate n! > e(2)” yields an 
explicit n that is “large enough.” 


where for r > 0 the denominator r! contains the prime factor 2 at most 
r — 1 times, while n! contains it exactly n — 1 times. (So for r > 0 the 
summands are even.) 
And since n is even (we assume that n = 2”), the series that we get for 
r>n+lare 
2b( d + = 8 eae -) 
nt+1  (n+1)(n+2)  (n+1)(n+ 2)(n +3) 


2a(- 2+ __* : sis | 
n+1 (n+1)(n+2)) (n+1)(n+2)(n4+3) ~ 


These series will for large n be roughly — resp. — 4a as one sees again by 
comparison with geometric series. For large n = 2” this means that the 
left-hand side of (1) is a bit larger than an integer, while the right-hand side 


is a bit smaller — contradiction! 


So we know that e+ is irrational; to show that e?, e° etc. are irrational as 
well, we need heavier machinery (that is, a bit of calculus), and a new idea 
— which essentially goes back to Charles Hermite, and for which the key 
is hidden in the following simple lemma. 


Lemma. For some fixed n > 1, let 


2n 
it 
(i) The function f(x) is a polynomial of the form f(x) = = x. Cx’, 
n! 
where the coefficients c; are integers. ten 


1 


nl" 


(ii) For0 <a <1 wehave0 < f(x) < 


(iii) The derivatives f‘*)(0) and f)(1) are integers for all k > 0. 


@ Proof. Parts (i) and (ii) are clear. 


For (iii) note that by (i) the k-th derivative f (k) vanishes at « = 0 unless 
n <k < 2n, and in this range f“) (0) = “tc, is an integer. From f(x) = 


f(1—2) we get f (x) = (—1)* f (1—2) for all x, and hence f)(1) = 
(—1)* f (0), which is an integer. 


Theorem 1. e” is irrational for every r € Q\{0}. 


@ Proof. It suffices to show that e* cannot be rational for a positive integer 
bee ; s\t ji 

s (if e« were rational, then (e#) = e* would be rational, too). Assume 

that e* = ¢ for integers a,b > 0, and let n be so large that n! > er. 

Put 


F(a) — 8?” f (x) — 3?" F(a) a grt? 6" (a) SE eee oh fen) (x), 


where f(z) is the function of the lemma. 


Some irrational numbers 51 


F(x) may also be written as an infinite sum 


F(x) = s* fiz) — 3?" f'(@) +37 fF" (a) Ss, 


since the higher derivatives f‘*) (a), for k > 2n, vanish. From this we see 
that the polynomial F'(2:) satisfies the identity 


F'(z) = —s F(x) + s?"*1 f(z). 
Thus differentiation yields 


. [es* F(x)| = se®* F(x) + e8* F(x) = 8??*1e8* f(x) 


and hence 
1 
N= vf s*rtles? t(r)dx = ble F(x)|, = aF(1) — bF(0). 
) 
This is an integer, since part (iii) of the lemma implies that F'(0) and F'(1) 


are integers. However, part (ii) of the lemma yields estimates for the size 
of N from below and from above, 


1 1 as2"t1 
0< N= bf snttes® f(z)dx < bstles_— = < 
0 n! n! 
which shows that NV cannot be an integer: contradiction. 
Now that this trick was so successful, we use it once more. 
Theorem 2. 7? is irrational. 
H@ Proof. Assume that 7? = % for integers a,b > 0. We now use the 


polynomial 


F(a) = b"(n™ F(a) — 1-2 f(a) +24 fO(0) E--- ), 


which satisfies F(x) = —n? F(x) + O° 1?"+? f(r), 
From part (iii) of the lemma we get that F'(0) and F(1) are integers. 7 is not rational, but it does have “good 


Elementary differentiation rules yield approximations” by rationals — some 
d of these were known since antiquity: 
Fa [F'(x) sinna —mF(x)costz] = (F(x) +7°F(a)) sinrax 22 = 3.142857142857... 
- 355 
; = 3.141592920353... 
= $n?" f(z) sin aa 115 
Wee = 3.141592653921... 


33215 


na” f(x) sina, 
qm = 3.141592653589... 
and thus we obtain 


1 i 
[=F'(e) sina — F(x) cos 7x 
1 0 


I 


Nowe [a f(a) sinzede 
0 
= 20) (1); 


which is an integer. Furthermore N is positive since it is defined as the 
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integral of a function that is positive (except on the boundary). However, 
if we choose n so large that *"- < 1, then from part (ii) of the lemma we 
obtain 

n 


1 
0< N= nf a” f(x)sinnadx < net ap 
0 


a contradiction. 


Here comes our final irrationality result. 


Theorem 3. For every odd integer n > 3, the number 


is irrational. 


We will need this result for Hilbert’s third problem (see Chapter 10) in the 
casesn = 3andn=9. Forn = 2 and n = 4 we have A(2) = } and 
A(4) = z, so the restriction to odd integers is essential. These values 
are easily derived by appealing to the diagram in the margin, in which the 


or 1 os : Pree : ‘ 

statement “= arccos (Fz) is irrational” is equivalent to saying that the 
polygonal arc constructed from Tz all of whose chords have the same 
length, never closes into itself. 


We leave it as an exercise for the reader to show that A(n) is rational only 
for n € {1,2,4}. For that, distinguish the cases when n = 2”, and when n 
is not a power of 2. 


@ Proof. We use the addition theorem 


cosa + cos B = 2.cos 248 cos 258 


from elementary trigonometry, which for a = (k + 1)y and 8 = (k—1)y 
yields 


cos(k+1)p = 2cosy cos kp — cos(k — 1)y. (2) 
For the angle y,, = arccos (Fz): which is defined by cosy, = ie and 


0 < Yn < 7, this yields representations of the form 


Ar 
vn 
where Aj; is an integer that is not divisible by n, for all k > 0. In fact, 


we have such a representation for k = 0,1 with Ag = A; = 1, and by 
induction on k using (2) we get for k > 1 


cos ky, = 


1 Ax Ap_-1 2Ap = nAp_1 


2 k k-1 k+1 
vrnJv/n Yn vn 


Thus we obtain Ay,, = 2A, — nAg_y. If nm > 3 is odd, and A; is not 
divisible by n, then we find that Ay, cannot be divisible by n, either. 


cos(k+1)y, = 
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Now assume that 


1 k 
A = =O, = 5 
(n) ~~ ? 
is rational (with integers k, 2 > 0). Then y,, = km yields 
A 
+1 = coskn = ar 
n 


Thus Vn = +A, is an integer, with 0 > 2, and hence n| vn. With 
Vn | Ag we find that n divides Ay, a contradiction. 
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Four times 77/6 Chapter 9 


We know that the infinite series Sy aice does not converge. Indeed, in  -~A.U MO. GO en AAD 
Chapter | we have seen that even the series }’,.-p "I diverges. 


However, the sum of the reciprocals of the squares converges (although ) ¢ 
very slowly, as we will also see), and it produces an interesting value. é 3 
¢ _ 

¢ 2 

Euler’s series ¢ 

1 C U 

SS ne a eG x 

n21 ¢ * 

c D) 

¢ > 

This is a classical, famous and important result by Leonhard Euler from, S 
1734. One of its key interpretations is that it yields the first nontrivial value << 3 
¢(2) of Riemann’s zeta function (see the appendix on page 62). This value ¢ é 
is irrational, as we have seen in Chapter 8. CARR ere 

But not only the result has a prominent place in mathematics history, there 
are also a number of extremely elegant and clever proofs that have their 

history: For some of these the joy of discovery and rediscovery has been 1 = 1.000000 
shared by many. In this chapter, we present four such proofs. 1+4 = 1.250000 
eer 14+9+5 = 1.361111 

@ Proof. The first proof appears as an exercise in William J. LeVeque’s | ,, 1,144 — 1.423611 
number theory textbook from 1956. But he says: “I haven’t the slightest 4 i i o _ = 1.463611 
idea where that problem came from, but I’m pretty certain that it wasn’t 1+4 stigtsstas = 1.491388 
original with me. r2/6 — 1.644934. 


The proof consists in two different evaluations of the double integral 


11 F 
J/ dx dy. 
l—zy 

0 0 


For the first one, we expand ;—— aT as a geometric series, decompose the 
summands as aa and integrate aia 


[ [Se \"dady = Sf fo ” dax dy 


n>0 n>0 0 


(f° BC ae resis 


at aae = 5 = 4). 


n>0 n>1 


ne 


© Springer-Verlag GmbH Germany, part of Springer Nature 2018 
M. Aigner, G. M. Ziegler, Proofs from THE BOOK, https://doi.org/10.1007/978-3-662-57265-8 9 


Four times x? /6 


56 
A 4 
1+ an 
v 
— 
* s # x 
a * > 
A 
14 
2 
U 
> a 
1 


This evaluation also shows that the double integral (over a positive function 
with a pole at x = y = 1) is finite. Note that the computation is also easy 
and straightforward if we read it backwards — thus the evaluation of ¢(2) 
leads one to the double integral J. 

The second way to evaluate J comes from a change of coordinates: in the 
new coordinates given by u = a and v = ¥>* the domain of integration 
is a square of side length 5V2 ; wich we get from the old domain by first 
rotating it by 45° and then shrinking it by a factor of /2. Substitution of 
x=u-—vandy=u-+v yields 

1 1 


l—-ay  1l—u2+v?" 


To transform the integral, we have to replace da dy by 2dudv, to com- 
pensate for the fact that our coordinate transformation reduces areas by a 
constant factor of 2 (which is the Jacobi determinant of the transformation; 
see the box on the next page). The new domain of integration, and the 
function to be integrated, are symmetric with respect to the w-axis, so we 
just need to compute two times (another factor of 2 arises here!) the inte- 
gral over the upper half domain, which we split into two parts in the most 
natural way: 


1/2 4 j 1 l-u F 
uv v 

l=4 du + 4 ——_—_> : 
[YU : [Cf Re) 
) 0 1/2. 0 

d 1 
Using / 5 . z= arctan — + C, this becomes 

ar+a2x a a 


1/2 


of gute (gta) 


These integrals can be simplified and finally evaluated by substituting wu = 
sin @ resp. u = cos @. But we proceed more directly, by computing that the 
derivative of g(u) = arctan (>=) i is g/(u) = ere while the deriva- 


tive of h(w) := arctan (F=) = arctan (tet) | is h!(u) = —} 5. 
So we may use vis fisis@ie= [AF (#)?]° a 3 f(b)? — 5 f(a)? and get 


1/2 1 
Ls | g'(u)g(u)du + a —2h'(u)h(u) du 


| 
iw) 
Ss 
— 
IR 
ie Se 
| 
i) 
S 
= 
= 
= 
5 
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This proof extracted the value of Euler’s series from an integral via a rather 
simple coordinate transformation. An ingenious proof of this type — with 
an entirely nontrivial coordinate transformation — was later discovered by 
Beukers, Calabi and Kolk. The point of departure for that proof is to split 


the sum >7os1 + into the even terms and the odd terms. Clearly the even 


terms 55 + ge t+ aztec: = esl OEP sum to 4¢(2), so the odd terms 


ts + 32 + ae fee = edo ene make up three quarters of the total 
sum ¢(2). Thus Euler’s series is equivalent to 


@ Proof. As above, we may express this as a double integral, namely 


11 
1 1 
J = dxdy = ; 
i, me 2 Ok +IP 


So we have to compute this integral J. And for this Beukers, Calabi and 


Kolk proposed the new coordinates 
1—y? 
arccos 4/2. 
V1 — 22y? 


To compute the double integral, we may ignore the boundary of the domain, 
and consider x, y in the range 0 < x < land0 < y < 1. Then, v will lie 
in the triangle u > 0, v > 0, u+v < 1/2. The coordinate transformation 
can be inverted explicitly, which leads one to the substitution 


1— 2? 
arccos 4 / ———-— v= 


uc 
1 — xy? 


sin u sin v 
GS and y= 
COs U 


cos U 


It is easy to check that these formulas define a bijective coordinate transfor- 
mation between the interior of the unit square S = {(7,y):0<2,y < 1} 
and the interior of the triangle T = {(u,v) : u,v >0, utu < 7/2}. 
Now we have to compute the Jacobi determinant of the coordinate transfor- 
mation, and magically it turns out to be 


cos U sinusinv sin? ti sin2 . 
cos UV cos2 uv _ =e 2.9 
det sin usin v cos U = 1 cos2 u cos2 V = le y. 


cos? u cos u 
But this means that the integral that we want to compute is transformed into 


n/27/2—u 

J = ip Z 1 du dv, 
0 0 

1 


which is just the area 5(5)? = = of the triangle T. 


The Substitution Formula 


To compute a double integral 
r= f f(v,y)dedy 
Ss 


we may perform a substitution of 
variables 


r=a2(u,v) y=y(u,v), 


if the correspondence of (u,v) € T’ 
to (x, y) € S is bijective and contin- 
uously differentiable. Then J equals 


[teow »),y(u,»))| a a [au av, 


where aoe is the Jacobi determi- 
nant: 
dx dx 
UzY) _ gop ( au a 
d(u,v) dy dy 
2 du dv 
A’ 
1 
S 
x 
& 1 > a” 
4. 
a 
2 
T 
uw 
e = 
T 
2 
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Beautiful — even more so, as the same method of proof extends to the 
computation of ¢(2k) in terms of a 2k-dimensional integral, for all k > 1. 
We refer to the original paper of Beuker, Calabi and Kolk [2], and to 
Chapter 26, where we’ll achieve this on a different path, using the Herglotz 
trick and Euler’s original approach. 


After these two proofs via coordinate transformation we can’t resist the 
temptation to present another, entirely different and completely elementary 


proof for }>., + = = It appears in a sequence of exercises in the 
problem book by the twin brothers Akiva and Isaak Yaglom, whose Russian 
original edition appeared in 1954. Versions of this beautiful proof were 
rediscovered and presented by F. Holme (1970), I. Papadimitriou (1973), 


and by Ransford (1982) who attributed it to John Scholes. 


@ Proof. The first step is to establish a remarkable relation between values 
of the (squared) cotangent function. Namely, for all m > 1 one has 
For m = 1, 2,3 this yields ; ‘ om(2m—1) 
wT Tw 20 er mr — m(2m— 
cot? = = 4 cot (sa) + cot (aa) +--+ + cot ay a 6 # AN) 
cot? e+ cot? = =2 


2 227 | 237 _ 
cot” = + cot” = + cot” + 5 


To establish this, we start with the relation e’” = cosx + isinz. Taking 
the n-th power e’”* = (e’”)”, we get 


cosna+isinnx = (cosx+isina)”. 


The imaginary part of this is 


P n , = mr : = 
sinnz = (") sin x cos”—! ¢ — (5) sin? xcos” 3 at--- (2) 


Now we let n = 2m + 1, while for x we will consider the m different 
values 7 = as , for r = 1,2,...,m. For each of these values we have 
nx = rr, and thus sinnx = 0, while 0 < 4 < 5 implies that for sin x we 


get m distinct positive values. 


In particular, we can divide (2) by sin” x, which yields 


_ n n—-1,,__ n a ee 
0. = (1) cot x Gr ea 


that is, 


2 1 2 1 
0 = oy ) cot (7M ) cota 


for each of the m distinct values of x. Thus for the polynomial of degree m 


to 2m+1 m 2m+1 m-1 m 2m+1 
w= Ce Craver) 


we know ™ distinct roots 


_ 2 TT _ 
ap = cot (5237) for r=1,2,...,m. 


The roots are distinct because cot? x = cot? y implies sin? x = sin? y and 
y y 


thus s = yforz,y€ {so :l<r<m}. 


Four times x? /6 


59 


Hence the polynomial coincides with 


att) = (PMT) (eco (xelez)) ++ (Coot? (sith). 


Comparison of the coefficients of t™~1 in p(t) now yields that the sum of 
the roots is 


ee 
2m(2m—1 
ay a mits + a = 3 = ( 6 ) 


Ca 


which proves (1). 
We also need a second identity, of the same type, 


esc” (gate) ese” (ar) fobs? (Sta) = sae) (3) 


for the cosecant function csc 7 = — But 
r 1 cos? x + sin? x r 
csc" £ = —z- = =o =cot"’z+1, 
sin* x sin* x 


so we can derive (3) from (1) by adding m to both sides of the equation. 
Now the stage is set, and everything falls into place. We use that in the 


range 0 < y < 5 we have 
0 < siny < y < tany, 
and thus 


0 < coty < y < escy, 
which implies 
cot?7y < wz < esc? y. 


Now we take this double inequality, apply it to each of the m distinct values 
of x, and add the results. Using (1) for the left-hand side, and (3) for the 
right-hand side, we obtain 


2m(2m—1) 2 (Gat + (eee (2mt1)? < 2m(2m+2) 


6 Tw 20 mr 6 i 
that is, 
nm? 2m _ 2m-1 MT ai ok ects og nm? _2m_ 2m+2 
6 2m+1 2m+1 12 ' 92 1 Tm? 6 2m+1 2m+1° 


Both the left-hand and the right-hand side converge to - for m —+ oo: 
end of proof. 


So how fast does }> _ converge to 7/6? For this we have to estimate the 
difference 


Comparison of coefficients: 

If p(t) = c(t — a1)---(t—am), 
then the coefficient of t+ is 
—c(ay +--+ + am). 


0<a<b<e 
implies 
te he od 
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This is very easy with the technique of “comparing with an integral” that 
\ we have reviewed already in the appendix to Chapter 2 (page 12). It yields 


m+1 


oY 


1 PL 1 
La | es, 
n t? 
n=m+1 m™ 
for an upper bound and 
Co 
1 =, dk 1 
Do, ee / pl = Bay 
n=m+1 uM m+1 m+ 
for a lower bound on the “remaining summands” — or even 
Co 
1 caer 1 
ae es 
n=m+4+1 m+ 5 2 


if you are willing to do a slightly more careful estimate, using that the 
function f(t) = 4 is convex. 

This means that our series does not converge too well; if we sum the first 
one thousand summands, then we expect an error in the third digit after 
the decimal point, while for the sum of the first one million summands, 
m = 1000000, we expect to get an error in the sixth decimal digit, and 
we do. However, then comes a big surprise: to an accuracy of 45 digits, 


x? /6 = 1.644934066848226436472415166646025189218949901, 
10° 

1 
Doe = 1.644933066848726436305748499979391855885616544. 
n=1 


So the sixth digit after the decimal point is wrong (too small by 1), but 
the next six digits are right! And then one digit is wrong (too large by 5), 
then again five are correct. This surprising discovery is quite recent, due to 
Roy D. North from Colorado Springs, 1988. (In 1982, Martin R. Powell, 
a school teacher from Amersham, Bucks, England, failed to notice the full 
effect due to the insufficient computing power available at the time.) It is 
too strange to be purely coincidental ... A look at the error term, which 
again to 45 digits reads 

S- = = 0.000000999999500000166666666666633333333333357, 
n=10541 


reveals that clearly there is a pattern. You might try to rewrite this last 
number as 


—6 14,q-—12 149-18 1 —30 1 —A2 


where the coefficients (1,—5,%,0,—35,0, 7) of 10~® form the be- 
ginning of the sequence of Bernoulli numbers that we’ll meet again in 
Chapter 26. We refer our readers to the article by Borwein, Borwein & 


Dilcher [3] for more such surprising “coincidences” — and for proofs. 
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And if only to repeat the point that it pays of to look for gems hidden 
in exercise sections of books, in particular if they are written by brothers, 
here’s our last proof for Euler’s Theorem, as sketched in Exercise 11 of 
page 381 of the book “Pi and the AGM” by the brothers Jonathan and Peter 
Borwein. It establishes that you can get Euler’s Theorem by “squaring” in 
an ingenious way the Gregory—Leibniz series 


ad i a oe _4 
n+l a ae ar + 


n=0 


@ Proof. The first trick in this proof is to consider the Gregory—Leibniz 


(=1)" 
2n+1° 


series in doubly-infinite form > 


n=—Cco 


As for negative n = —k < 0 


Ey 


_ (pe 
2(—k)+1 — 


we get the same terms as for n = k — 1 > 0, since (k= 1) 41° 

N * 

ee 
ty 2n4+1 


n=— 


square of this sum converges to 77/4. You may write this as 


we infer that converges to 7/2 with N -> oo, and thus the 


N 


» 


m,n=— 


oy . 
yomt1an+1 4S 


lim 


The double sum may be interpreted as the sum of all entries of a square 
matrix of size (2N +1) x (2N +1), and we know that for N —> oo this 
sum of all entries tends to 1? /4. We want to know, however, that the sum 
of only the diagonal entries, for m = n, also tends to 1? /4, 


lim sy : =e 
Noo £~ (Qn+1)2 4” 
because then S>*~ 4 Gntiy = 77/8 will follow, and this, as we know, is 


equivalent to Euler’s theorem. So let’s show that the sum of all off-diagonal 
terms tends to 0! We write dy for this sum, and use a prime to denote that 
the diagonal terms with m = n are deleted, so 


N (-1)™" 


f 
4, (2m + 1)(2n + 1) 
= > } 


m,n=—N 


iy = 


oF 1 1 1 1 
2m — 2n2m+1 2m —2n2n+1 


= / 
= x ( 
N 


N 
m,n=—N 
N 


Lge 1 1 1 1 
2m — 2n2m+1 2n —2m2m+1 


1 1 


m—-n2m+1 


2m + ( sy ry). 


m>=— n= 


/ 


md 


( 
m, N 


~ 


(= 
1 


1)" 

1)" 

1)" 
1 


Prove the Gregory—Leibniz identity, for 
example by integrating the geometric 


i 2 ce aoe ne 
series 1 — a + a° + = ae and 
then evaluating at 1. 

de eh 
Here we use that =; = 77(7 — <). for 


k # &. This replaces 7 by two sum- 
mands that are not symmetric in k and £. 


For this we have interchanged m + n 


in the second part of the double sum. 
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We only need to show that the terms 
N , (-1)™—" 
Cm,N ‘= x ara es 
n=—N 
are small enough in absolute value. What do we know about them? It is 
easy to see that c_m,.n = —Cm,n, SO in particular co,7 = 0. Thus we 
may assume that m > 0, and note that the summands for n = m+ k and 
n = m-—k cancel as long as they are in the range between —N and N, that 
is, forl < k < N—m. Thus cy equals the alternating sum of fractions of 
decreasing size given by the remaining terms, where the largest one occurs 
forn =m—(N—m)—1=2m—N —1, thatism—n=N—m+1. 
Hence 
1 1 1 
rs | N-—-m+1 
Cnn = (1) N-m+1 N—m+2 m+N)’ 
which implies that 
1 
lomwl S Wap 
This finally yields 
a 1 af 1 
é6 < 
bel SS Jamea|lemal SS apap aalemn 
m=—N m=—N 
N Ny l 
< 2 < 2 — 
= Ss Jem, as = es 
m=1 m=1 
N 
Here we use that 4 = eats + ;¢) for _ 3 1 1 tt 
positive k and @. i N+1\m N-m+t1 
We got the estimate Hy < logN +1 1 log N +1 
: = 2——_(Hy+Hy) < 4 ; 
for the harmonic numbers on page 13. Nasi N +1 


and this goes to 0 as N goes to infinity. 


Appendix: The Riemann zeta function 


The Riemann zeta function ¢(s) is defined for real s > 1 by 


() = OS. 


n>1 


Our estimates for H,, (see page 12) imply that the series for ¢(1) diverges, 
but for any real s > 1 it does converge. The zeta function has a canonical 
continuation to the entire complex plane (with one simple pole at s = 1), 
which can be constructed using power series expansions. The resulting 
complex function is of utmost importance for the theory of prime numbers. 
Let us mention four diverse connections: 
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(1) The remarkable identity 


is due to Euler. It encodes the basic fact that every natural number has a 
unique (!) decomposition into prime factors; using this, Euler’s identity is 
a simple consequence of the geometric series expansion 


The irrationality of ¢(2) = . together with Euler’s identity implies, again, 
that there are infinitely many primes ... 


(2) The following marvelous argument of Don Zagier computes ¢(4) 
from ¢(2). Consider the function 


‘he Wy 2 2 | 1 2 


mn 


+ 
mn? mn? 
for integers m,n > 1. It is easily verified that for all m and n, 


2 


2 


f(m,n) — f(m+n,n) — f(m,m +n) an?” 


Let us sum this equation over all m,n > 1. If i 4 j, then (7, 7) is either of 
the form (m+n, 7) or of the form (m,m +n), for m,n > 1. Thus, in the 
sum on the left-hand side all terms f(i, 7) with ¢ 4 j cancel, and so 


Sin) = A = sc) 


n>1 n>1 


remains. For the right-hand side one obtains 


2 iL 1 
DD ae = PDa da = 2), 


mn>1 m>1 n>1 


and out comes the equality 


With ¢(2) = x we thus get ¢(4) = a 
Another derivation via Bernoulli numbers appears in Chapter 26. 
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(3) It has been known for a long time that ¢(s) is a rational multiple of 7°, 
and hence irrational, if s is an even integer s > 2; see Chapter 26. In 
contrast, the irrationality of ¢(3) was proved by Roger Apéry only in 1979. 
Despite considerable effort the picture is rather incomplete about ¢(s) for 
the other odd integers, s = 2t + 1 > 5. However, Keith Ball and Tanguy 
Rivoal proved that infinitely many of the values ¢(2t + 1) are irrational. 
And indeed, although it is not known for any single odd value s > 5 that 
¢(s) is irrational, Wadim Zudilin has proved that at least one of the four 
values ¢(5), ¢(7), ¢(9), and ¢(11) is irrational. We refer to the beautiful 
survey by Fischler. 


(4) The location of the complex zeros of the zeta function is the subject 
of the “Riemann hypothesis”: one of the most famous and important un- 
resolved conjectures in all of mathematics. It claims that all the nontrivial 
zeros s € C of the zeta function satisfy Re(s) = 4. (The zeta function 
vanishes at all the negative even integers, which are referred to as the 
“trivial zeros.”) 


Surprisingly, Jeff Lagarias showed that the Riemann hypothesis is equiva- 
lent to the following elementary statement: For all n > 1, 


Sod < Hn + exp(Hn) log(Hn), 
d|n 
with equality only for mn = 1, where H,, is again the n-th harmonic number. 
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Hilbert’s third problem: Chapter 10 
decomposing polyhedra 2 


In his legendary address to the International Congress of Mathematicians 
at Paris in 1900 David Hilbert asked — as the third of his twenty-three 
problems — to specify 


“two tetrahedra of equal bases and equal altitudes which can in 
no way be split into congruent tetrahedra, and which cannot be 
combined with congruent tetrahedra to form two polyhedra which 
themselves could be split up into congruent tetrahedra.” 


This problem can be traced back to two letters of Carl Friedrich Gauss 
from 1844 (published in Gauss’ collected works in 1900). If tetrahedra 
of equal volume could be split into congruent pieces, then this would give 
one an “elementary” proof of Euclid’s theorem XII.5 that pyramids with 
the same base and height have the same volume. It would thus provide 
an elementary definition of the volume for polyhedra (that would not de- 
pend on continuity arguments). A similar statement is true in plane geome- pa yig Hilbert 
try: the Bolyai-Gerwien Theorem [1, Sect. 2.7] states that planar polygons 

are both equidecomposable (can be dissected into congruent triangles) and 
equicomplementable (can be made equidecomposable by adding congruent 

triangles) if and only if they have the same area. 


<= 
The cross is equicomplementable with a 
square of the same area: By adding the 
same four triangles we can make them 


equidecomposable (indeed: congruent). 


In fact, the cross and the square are even 


equidecomposable. 


© Springer-Verlag GmbH Germany, part of Springer Nature 2018 
M. Aigner, G. M. Ziegler, Proofs from THE BOOK, https://doi.org/10.1007/978-3-662-57265-8_10 
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VON 


This equidecomposition of a square and 
an equilateral triangle into four pieces is 
due to Henry Dudeney (1902). 

The short segment in the middle of the 
equilateral triangle is the intersection of 
pieces A and C, but it is not an edge of 
any one of the pieces. 


Hilbert — as we can see from his wording of the problem — did expect that 
there is no analogous theorem for dimension three, and he was right. In fact, 
the problem was completely solved by Hilbert’s student Max Dehn in two 
papers: The first one, exhibiting non-equidecomposable tetrahedra of equal 
base and height, appeared already in 1900, the second one, also covering 
equicomplementability, appeared in 1902. However, Dehn’s papers are not 
easy to understand, and it takes effort to see whether Dehn did not fall into a 
subtle trap which ensnared others: a very elegant but unfortunately wrong 
proof was found by Raoul Bricard (in 1896!), by Herbert Meschkowski 
(1960), and probably by others. However, Dehn’s proof was reworked by 
others, clarified and redone, and after combined efforts of several authors 
one arrived at the “classical proof’, as presented in Boltianskii’s book on 
Hilbert’s third problem and also in earlier editions of this one. 

In the following, however, we take advantage of a decisive simplification 
that was found by V. F. Kagan from Odessa already in 1903: His integral- 
ity argument, which we here present as the “cone lemma’, yields a “pearl 
lemma” (given here in a recent version, due to Benko), and from this we 
derive a correct and complete proof for “Bricard’s condition” (as claimed 
in Bricard’s 1896 paper). Once we apply this to some examples we easily 
obtain the solution of Hilbert’s third problem. 

The appendix to this chapter provides some basics about polyhedra. 


As above we call two polyhedra P and Q equidecomposable if they can 
be decomposed into finite sets of polyhedra P,,...,P, and Q1,...,Qn 
such that P; and Q; are congruent for all 7. Two polyhedra are equicomple- 
mentable if there are equidecomposable polyhedra P = Py/U --- UP” and 
Q = QU --- UQ" that also have decompositions involving P and Q of 
the form P = PUP/UPLU--- UP! and Q = QUQ),UQ,U--- UQ!,, 
where P;, is congruent to Qj, for all k. (See the large figure to the right 
for an illustration.) A theorem of Gerling from 1844 [1, §12] implies that 
for these definitions it does not matter whether we admit reflections when 
considering congruences, or not. 

For polygons in the plane, equidecomposability and equicomplementability 
are defined analogously. 

Clearly, equidecomposable objects are equicomplementable (this is the case 
m = 0), but the converse is far from clear. We will use “Bricard’s condi- 
tion” as our tool to certify — as Hilbert proposed — that certain tetrahedra 
of equal volume are not equicomplementable, and in particular not equi- 
decomposable. 

Before we really start to work with three-dimensional polyhedra, let us 
derive the pearl lemma, which is equally interesting also for planar decom- 
positions. It refers to the segments in a decomposition: In any decompo- 
sition the edges of one piece may be subdivided by vertices or edges of 
other pieces; the pieces of this subdivision we call segments. Thus in the 
two-dimensional case any endpoint of a segment is given by some vertex. 
In the three-dimensional case the end of a segment may also be given by 
a crossing of two edges. However, in any case all the interior points of a 
segment belong to the same set of edges of pieces. 
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P=PUPlU---UP,, O=QUu0, UW UCL 


P= pru..-upe 


The Pearl Lemma. /f P and Q are equidecomposable, then one can place 
a positive numbers of pearls (that is, assign positive integers) to all the 
segments of the decompositions P = P,U---UP, and Q = Q,U---UQn 
in such a way that each edge of a piece P;, receives the same number of 
pearls as the corresponding edge of Qx. 


@ Proof. Assign a variable x; to each segment in the decomposition of P 
and a variable y; to each segment in the decomposition of @). Now we have 
to find positive integer values for the variables x7; and y; in such a way 
that the x;-variables corresponding to the segments of any edge of some 
Py, yield the same sum as the y;-variables assigned to the segments of the 
corresponding edge of Q;. This yields conditions that require that “some 
x;-variables have the same sum as some y;-values”, namely 


where the edge e C P, decomposes into the segments s;, while the corre- 
sponding edge e’ C Q; decomposes into the segments Si. This is a linear 
equation with integer coefficients. 

We note, however, that positive real values satisfying all these requirements 
exist, namely the (real) lengths of the segments! Thus we are done, in view 
of the following lemma. O 


The polygons P and Q considered in the figure above are, indeed, equide- 
composable. The figure to the right illustrates this, and shows a possible 
placement of pearls. 


For a parallelogram P and a nonconvex 
hexagon @ that are equicomplementary, 
this figure illustrates the four decompo- 


sitions we refer to. 
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201 — 3x2 =0 


Example: Here C' is given by 
2x1 — 3x2 = 0, x; > 1. Eliminating x2 
yields 7, > 3 The lexicographically 
minimal solution to the system is (2, 1). 


The Cone Lemma. /f a system of homogeneous linear equations with in- 
teger coefficients has a positive real solution, then it also has a positive 
integer solution. 


@ Proof. The name of this lemma stems from the interpretation that the set 
C = {xe R : Ax=0, x>0} 


given by an integer matrix A €¢ Z”™*% describes a (relatively open) rational 
cone. We have to show that if this is nonempty, then it also contains integer 
points: CON 4 @. 

If C is nonempty, then so is C := {x € RN : Ax = 0, x > 1}, since for 
any positive vector a suitable multiple will have all coordinates equal to or 
larger than 1. (Here 1 denotes the vector with all coordinates equal to 1.) 
It suffices to verify that C C C contains a point with rational coordinates, 
since then multiplication with a common denominator for all coordinates 
will yield an integer point in C C C. 

There are many ways to prove this. We follow a well-trodden path that was 
first explored by Fourier and Motzkin [8, Lecture 1]: By “Fourier—Motzkin 
elimination” we show that the lexicographically smallest solution to the 
system 


Ax=0,x>1 


exists, and that it is rational if the matrix A is integral. 


Indeed, any linear equation a”’x = 0 can be equivalently enforced by two 


inequalities a’x > 0, —a’x > 0. (Here a denotes a column vector and 
a’ its transpose.) Thus it suffices to prove that any system of the type 


Ax>b,x>1 


with integral A and b has a lexicographically smallest solution, which is 
rational, provided that the system has any real solution at all. 


For this we argue with induction on V. The case NV = lisclear. For N > 1 
look at all the inequalities that involve xy. If x’ = (a@1,...,@N—1) is fixed, 
these inequalities give lower bounds on xy (among them xy > 1) and 
possibly also upper bounds. So we form a new system A’x’ > b, x’ > 1 
in N — 1 variables, which contains all the inequalities from the system 
Ax > b that do not involve xy, as well as all the inequalities obtained 
by requiring that all upper bounds on xy (if there are any) are larger or 
equal to all the lower bounds on x,y (which include xy > 1). This system 
in N — 1 variables has a solution, and thus by induction it has a lexico- 
graphically minimal solution 2/,, which is rational. And then the smallest 
xn compatible with this solution x, is easily found, it is determined by a 
linear equation or inequality with integer coefficients, and thus it is rational 
as well. 


Hilbert’s third problem: decomposing polyhedra 


71 


Now we focus on decompositions of three-dimensional polyhedra. The 
dihedral angles, that is, the angles between adjacent facets, play a decisive 
role in the following theorem. 


Theorem. (“Bricard’s condition’’) 

If three-dimensional polyhedra P and Q with dihedral angles a1,..., Oy 
resp. 21,..., 8s are equidecomposable, then there are positive integers mi, 
n, and an integer k with 


MyQ1 oe MA = M4181 ++->+NsBs + kr. 
The same holds more generally if P and Q are equicomplementable. 


@ Proof. Let us first assume that P and @ are equidecomposable, with 
decompositions P = P; U---U P, and Q = Q, U---U Qn, where P; is 
congruent to Q;. We assign a positive number of pearls to every segment 
in both decompositions, according to the pearl lemma. 

Let X11 be the sum of all the dihedral angles at all the pearls in the pieces 
of the decomposition of P. If an edge of a piece P; contains several pearls, 
then the dihedral angle at this edge will appear several times in the sum )11. 
If a pearl is contained in several pieces, then several angles are added for 
this pearl, but they are all measured in the plane through the pearl that is 
orthogonal to the corresponding segment. If the segment is contained in 
an edge of P, the addition yields the (interior) dihedral angle a; at the 
edge. The addition yields the angle 7 in the case that the segment lies in 
the boundary of P but not on an edge. If the pearl/the segment lies in the 
interior of P, then the sum of dihedral angles yields 27 or 7. (The latter 
case occurs in case the pearl lies in the interior of a face of a piece P;.) 


Thus we get a representation 
4, = myayt:::-+mra, + kyr 


for positive integers m; (1 < 7 <r) and nonnegative k,. Similarly for the 
sum ¥2 of all the angles at the pearls of the decomposition of Q we get 


My = mPit+--:+nshs + kor 


for positive integers n; (1 < j < s) and nonnegative ko. 

However, we can also obtain the sums © and ‘2 by adding all the contribu- 
tions in the individual pieces P; and Q;. Since P; and Q; are congruent, we 
measure the same dihedral angles at the corresponding edges, and the Pearl 
Lemma guarantees that we get the same number of pearls from the decom- 
positions of P resp. Q at the corresponding edges. Thus we get ©; = Yo, 
which yields Bricard’s condition (with k = kz — k, © Z) for the case of 
equidecomposability. 


Now let us assume that P and Q are equicomplementable, that is, that we 
have decompositions 


P= PUPU--UP, and Q = QUQ,U---UQ, 


= 


In a cube, all dihedral angles are 3. 


For a prism over an equilateral triangle, 
we get the dihedral angles 3 and 3. 
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where P/ and Q’ are congruent, and such that P and Q are equidecompos- 
able, as 


P= Plu.-UPY and Q = Qiu---UQY, 

where P’ and Q” are congruent (as in the figure on page 69). Again, using 
the pearl lemma, we place pearls to all the segments in all four decompo- 
sitions, where we impose the extra condition that each edge of P gets the 
same total number of pearls in both decompositions, and similarly for Q. 
(The proof of the pearl lemma via the cone lemma allows for such extra 
restrictions!) We also compute the sums of angles at pearls © and 4 as 
well as / and 4. 

The angle sums 1 and SY refer to decompositions of different polyhedra, 
P and Q, into the same set of pieces, hence we get :” = Sf as above. 
The angle sums )4 and &/ refer to different decompositions of the same 
polyhedron, P. Since we have put the same number of pearls onto the edges 
in both decompositions, the argument above yields X14, = “/ + ¢,7 for an 
integer 0; € Z. The same way we also get 44 = NY + om for an integer 
fy € Z. Thus we conclude that 


y= Y +én for =f. —-&, €Z. 


However, &/, and X refer to decompositions of P resp. Q into the same 
pieces, except that the first one uses P as a piece, while the second uses Q. 
Thus subtracting the contributions of P/ resp. Q/ from both sides, we obtain 
the desired conclusion: the contributions of P and Q to the respective angle 
sums, 

mya, +-::-+m,ra, and m6, +---+ns8Bs, 


where m, counts the pearls on edges with dihedral angle a; in P and n; 
counts the pearls on edges with dihedral angle {; in Q, differ by an integer 
multiple of 7, namely by éz. 


From Bricard’s condition we now get a complete solution for Hilbert’s third 
problem: We just have to compute the dihedral angles for some examples. 


Example 1. For a regular tetrahedron To with edge lengths @, we calculate 
the dihedral angle from the sketch. The midpoint MV of the base triangle 
divides the height AE of the base triangle by 1:2, and since |AE| = |DE 
we find cosa = 3 and thus 


> 


a= arccos 5. 


Thus we find that a regular tetrahedron cannot be equidecomposable or 
equicomplementable with a cube. Indeed, all the dihedral angles in a cube 
equal 5, so Bricard’s condition requires that 


my arccos ; = 115 +knr 


for positive integers m,n and an integer k. But this cannot hold, since 


we know from Theorem 3 of Chapter 8 that i arccos 3 is irrational. 
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Example 2. Let 7, be a tetrahedron spanned by three orthogonal edges 
AB, AC, AD of length u. This tetrahedron has three dihedral angles that 
are right angles, and three more dihedral angles of equal size y, which we 
calculate from the sketch as 

JAE] —  4v2u 1 


IDE|  3V3VQu VB 


cosy = 


It follows that 

y = arccos ret 
Thus the only dihedral angles occuring in 7; are 7, 5, and arccos Fe 
From this Bricard’s condition tells us that this tetrahedron as well is not 
equicomplementable with a cube of the same volume, this time using that 
= 


ul 
= arccos 
7 Are SA 


is irrational, as we proved in Chapter 8 (take n = 3 in Theorem 3). 


Example 3. Finally, let T2 be a tetrahedron with three consecutive edges 
AB, BC and CD that are mutually orthogonal (an “orthoscheme’’) and of 
the same length w. 

It is easy to calculate the angles in such a tetrahedron (three of them equal 4, 
two of them equal %, and one of them is ue if we use that the cube of side 
length « can be decomposed into six tetrahedra of this type (three congru- 
ent copies, and three mirror images). Thus all dihedral angles in 72 are 
rational multiples of 7, and thus with the same proofs as above (in particu- 
lar, the irrationality results that we have quoted from Chapter 8) Bricard’s 
Condition implies that T> is not equidecomposable, and not even equicom- 
plementable, with To or 7}. 


This solves Hilbert’s third problem, since T; and T> have congruent bases 
and the same height. 


Appendix: Polytopes and polyhedra 


A convex polytope in R7 is the convex hull of a finite set S = {81,..., 8n}, 
that is, a set of the form 


P = conv(S) = ou » A; > 0, a = ih. 
i=l i=l 


Polytopes are certainly familiar objects: Prime examples are given by con- 
vex polygons (2-dimensional convex polytopes) and by convex polyhedra 
(3-dimensional convex polytopes). 
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Familiar polytopes: 
tetrahedron and cube 


The permutahedron has 24 vertices, 
36 edges and 14 facets. 


There are several types of polyhedra that generalize to higher dimensions 
in a natural way. For example, if the set S is affinely independent of 
cardinality d + 1, then conv(S') is a d-dimensional simplex (or d-simplex). 
For d = 2 this yields a triangle, for d = 3 we obtain a tetrahedron. Simi- 
larly, squares and cubes are special cases of d-cubes, such as the unit d-cube 
given by 

Cy = [0,1)"* C R*. 


General polytopes are defined as finite unions of convex polytopes. In this 
book nonconvex polyhedra will appear in connection with Cauchy’s rigidity 
theorem in Chapter 14, and nonconvex polygons in connection with Pick’s 
theorem in Chapter 13, and again when we discuss the art gallery theorem 
in Chapter 40. 

Convex polytopes can, equivalently, be defined as the bounded solution sets 
of finite systems of linear inequalities. Thus every convex polytope P C R¢@ 
has a representation of the form 


P = {w@€R*: Ax <b} 


for some matrix A € R™*4 and a vector b € R™. In other words, P is the 
solution set of a system of m linear inequalities 

ala < bi, 
where a? is the i-th row of A. Conversely, every bounded such solution 
set is a convex polytope, and can thus be represented as the convex hull of 
a finite set of points. 
For polygons and polyhedra, we have the familiar concepts of vertices, 
edges, and 2-faces. For higher-dimensional convex polytopes, we can de- 
fine their faces as follows: a face of P is a subset F' C P of the form 


Pr{a €R¢:a’x = d}, 


where a? x < bis a linear inequality that is valid for all points x € P. 

All the faces of a polytope are themselves polytopes. The set V of vertices 
(0-dimensional faces) of a convex polytope is also the inclusion-minimal set 
such that conv(V) = P. Assuming that P C R¢ is a d-dimensional convex 
polytope, the facets (the (d—1)-dimensional faces) determine a minimal set 
of hyperplanes and thus of halfspaces that contain P, and whose intersec- 
tion is P. In particular, this implies the following fact that we will need 
later: Let F' be a facet of P, denote by Hp the hyperplane it determines, 
and by He and H,, the two closed half-spaces bounded by Hy. Then one 
of these two halfspaces contains P (and the other one doesn’t). 

The graph G(P) of the convex polytope P is given by the set V of ver- 
tices, and by the edge set E of 1-dimensional faces. If P has dimension 3, 
then this graph is planar, and gives rise to the famous “Euler polyhedron 
formula” (see Chapter 13). 
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Two polytopes P, P’ C R¢@ are congruent if there is some length-preserving 
affine map that takes P to P’. Such a map may reverse the orientation 
of space, as does the reflection of P in a hyperplane, which takes P to 
a mirror image of P. They are combinatorially equivalent if there is a 
bijection from the faces of P to the faces of P’ that preserves dimension 
and inclusions between the faces. This notion of combinatorial equivalence 
is much weaker than congruence: for example, our figure shows a unit cube 
and a “skew” cube that are combinatorially equivalent (and thus we would 
call any one of them “a cube’), but they are certainly not congruent. 

A polytope (or a more general subset of IR) is called centrally symmetric 
if there is some point a € R®@ such that 


atxexeP = «@-ZEP. 


In this situation we call ao the center of P. 
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Combinatorially equivalent polytopes 


Lines in the plane Chapter 11 
and decompositions of graphs 2 


Perhaps the best-known problem on configurations of lines was raised by 
Sylvester in 1893 in a column of mathematical problems. 


QUESTIONS FOR SOLUTION. : 

11851. (Professor Sytvesrex.)—Prove that it is not possible to 
arrange any finite number of real points so that a right line through 
every two of them shall pass through a third, unless they all lie in the 
same right line. 
Whether Sylvester himself had a proof is in doubt, but a correct proof was 
given by Tibor Gallai [Griinwald] some 40 years later. Therefore the fol- 
lowing theorem is commonly attributed to Sylvester and Gallai. Subsequent 
to Gallai’s proof several others appeared, but the following argument due 
to L. M. Kelly may be “simply the best.” 


Theorem 1. /n any configuration of n points in the plane, not all on a line, 
there is a line which contains exactly two of the points. 


@ Proof. Let P be the given set of points and consider the set £ of all lines 
which pass through at least two points of P. Among all pairs (P, 2) with P 
not on £, choose a pair (Po, £9) such that Pp has the smallest distance to fo, 
with Q being the point on fp closest to Po (that is, on the line through P 
vertical to £9). 


Claim. This line €9 does it! 


J. J. Sylvester 


If not, then contains at least three points of P, and thus two of them, say 
P, and P», lie on the same side of Q. Let us assume that P; lies between 
Q and P2, where P; possibly coincides with Q. The figure on the right 
shows the configuration. It follows that the distance of P, to the line ¢; 
determined by Py and P, is smaller than the distance of Pp to £9, and this 
contradicts our choice for 29 and Po. 


In the proof we have used metric axioms (shortest distance) and order 
axioms (P; lies between @ and P2) of the real plane. Do we really need 
these properties beyond the usual incidence axioms of points and lines? 
Well, some additional condition is required, as the famous Fano plane de- 
picted in the margin demonstrates. Here P = {1,2,...,7} and L consists 
of the 7 three-point lines as indicated in the figure, including the “line” 5 4 
{4,5,6}. Any two points determine a unique line, so the incidence axioms 
are satisfied, but there is no 2-point line. The Sylvester—Gallai theorem 
therefore shows that the Fano configuration cannot be embedded into the 
real plane such that the seven collinear triples lie on real lines: there must 
always be a “crooked” line. a 6 2 
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However, it was shown by Coxeter that the order axioms will suffice for 
a proof of the Sylvester—Gallai theorem. Thus one can devise a proof that 
does not use any metric properties — see also the proof that we will give in 
Chapter 13, using Euler’s formula. 

Armed with Theorem |, we may ask how many such two-point lines every 
n-point configuration in the plane must contain. After many partial results, 
the definitive answer was given very recently by Ben Green and Terence 
Tao: There is a constant no such that every configuration of n > ng points, 
not all on a line, contains at least n/2 two-point lines, and this is best pos- 
sible — if n is even. In the case when n is odd, they prove that there are 
even at least 3|/4| such lines, and again this is best possible. 


The Sylvester—Gallai theorem directly implies another famous result on 
points and lines in the plane, due to Paul Erdés and Nicolaas G. de Bruijn. 
But in this case the result holds more generally for arbitrary point-line 
systems, as was observed already by Erdés and de Bruijn. We will discuss 
the more general result in a moment. 


Theorem 2. Let P be a set of n > 3 points in the plane, not all on a line. 
Then the set L£ of lines passing through at least two points contains at least 
n lines. 


@ Proof. For = 3 there is nothing to show. Now we proceed by induction 
on n. Let |P| = + 1. By the previous theorem there exists a line fy) € L 
containing exactly two points P and Q of P. Consider the set P’ = P\{Q} 
and the set £’ of lines determined by P’. If the points of P’ do not all lie 
on a single line, then by induction |£’| > n and hence |£| > n + 1 because 
of the additional line £o in £. If, on the other hand, the points in P’ are all 
on a single line, then we have the “pencil” which results in precisely n + 1 
lines. 


Now, as promised, here is the general result, which applies to much more 
general “incidence geometries.” 


Theorem 3. Let X be a set of n > 3 elements, and let A,,...,Am be 
proper subsets of X, such that every pair of elements of X is contained in 
precisely one set A;. Then m > n holds. 


@ Proof. The following proof, variously attributed to Motzkin or Conway, 
is almost one-line and truly inspired. For 2 € X let rz be the number of 
sets A; containing x. (Note that 2 < r, < m by the assumptions.) Now if 
x ¢ Aj, then r, > |A;| because the |A;| sets containing x and an element 
of A; must be distinct. Suppose m < n, then m|A;| < nr, and thus 
m(n— |A;|) > n(m— rz) for x ¢ Aj, and we find 


di x : = Se 2 ao > > 2 AGA a PS a =1 


LEX cEX Aj:rG A; Aj u:tGA; Aj 


which is absurd. 
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There is another very short proof for this theorem that uses linear algebra. 


Let B be the incidence matrix of (X;A1,..., Am), that is, the rows in B 
are indexed by the elements of X, the columns by A1,..., Am, where 
_ fl if «eA 
ae { 0 if rg A. 


Consider the product BB?. For x 4 x’ we have (BB™)..: = 1, since x 
and x’ are contained in precisely one set A;, hence 


Tx—-1 0 0 i il 1 

oP 0 Te_—1 fe eee 
0 1 
0 0 Tz,—1 1 i 4 


where r,, is defined as above. Since the first matrix is positive definite (it has 
only positive eigenvalues) and the second matrix is positive semi-definite 
(it has the eigenvalues n and 0), we deduce that BB" is positive definite 
and thus, in particular, invertible, implying rank(B.B”) = n. It follows that 
the rank of the n x m matrix B is at least n, and we conclude that indeed 
nm <m, since the rank cannot exceed the number of columns. 


Let us go a little beyond and turn to graph theory. (We refer to the review of 
basic graph concepts in the appendix to this chapter.) A moment’s thought 
shows that the following statement is really the same as Theorem 3: 


If we decompose a complete graph K,, into m cliques different 
from K,, such that every edge is in a unique clique, thenm > n. 


Indeed, let X correspond to the vertex set of A, and the sets A; to the 
vertex sets of the cliques, then the statements are identical. 

Our next task is to decompose /x,, into complete bipartite graphs such that 
again every edge is in exactly one of these graphs. There is an easy way to 
do this. Number the vertices {1,2,...,}. First take the complete bipartite 
graph joining | to all other vertices. Thus we obtain the graph K,,,_1 
which is called a star. Next join 2 to 3,...,n, resulting in a star Ky ,_9. 
Going on like this, we decompose /,, into stars Ky 1, Ki n-2,..., A131. 
This decomposition uses n — 1 complete bipartite graphs. Can we do better, 
that is, use fewer graphs? No, as the following result of Ron Graham and 
Henry O. Pollak says: 


Theorem 4. Jf K,, is decomposed into complete bipartite subgraphs 
A,..., Hm, thenm>n-—1. 


The interesting thing is that, in contrast to the Erd6s—de Bruijn theorem, no 
combinatorial proof for this result is known! All of them use linear algebra 
in one way or another. Of the various more or less equivalent ideas let us 
look at the proof due to Tverberg, which may be the most transparent. 


A decomposition of K's into 4 complete 
bipartite subgraphs 


80 Lines in the plane, and decompositions of graphs 
M@ Proof. Let the vertex set of K,, be {1,...,n}, and let L;,R; be the 
defining vertex sets of the complete bipartite graph H;, 7 = 1,...,m. 
To every vertex 2 we associate a variable x;. Since Hy,...,H,, decom- 
pose K,,, we find 

m 
Ddowits = DD) ta- D) 2). () 
i<j k=1 a€Ly beERn 
Now suppose the theorem is false, m < mn — 1. Then the system of linear 
equations 
Zt +e, = VD, 
LD la = O (k =1, ,m) 
acLy 
has fewer equations than variables, hence there exists a nontrivial solution 
C1,---,Cn- From (1) we infer 
a CiCj = 0. 
i<j 
But this implies 
nm n 
0 = (ate:+en)? = Soa +2) ce; - baie > 0, 
i=1 i<j i=1 
a contradiction, and the proof is complete. 
Appendix: Basic graph concepts 
G: Graphs are among the most basic of all mathematical structures. Corre- 


A graph G with 7 vertices and 11 edges. 
It has one loop, one double edge and one 
triple edge. 


The complete graphs Ky, on n vertices 
and (>) edges 


spondingly, they have many different versions, representations, and incar- 
nations. Abstractly, a graph is a pair G = (V, E), where V is the set of 
vertices, E; is the set of edges, and each edge e € FE “connects” two ver- 
tices v,w € V. We consider only finite graphs, where V and £ are finite. 
Usually, we deal with simple graphs: Then we do not admit loops, i. e., 
edges for which both ends coincide, and no multiple edges that have the 
same set of endvertices. Vertices of a graph are called adjacent or neighbors 
if they are the endvertices of an edge. A vertex and an edge are called 
incident if the edge has the vertex as an endvertex. 


Here is a little picture gallery of important (simple) graphs: 
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Two graphs G = (V, £) and G’ = (V’, E’) are considered isomorphic if 
there are bijections V > V’ and E - E” that preserve the incidences be- 
tween edges and their endvertices. (It is a major unsolved problem whether 
there is an efficient test to decide whether two given graphs are isomorphic.) 
This notion of isomorphism allows us to talk about the complete graph Ks 
on 5 vertices, etc. 

G’ = (V’, E’) isa subgraph of G = (V, E) if V’ CV, E’ C E, and every 
edge e € E’ has the same endvertices in G’ as in G. G" is an induced 
subgraph if, additionally, all edges of G' that connect vertices of G’ are also 
edges of G’. 

Many notions about graphs are quite intuitive: for example, a graph G is 
connected if every two distinct vertices are connected by a path in G, or 
equivalently, if G cannot be split into two nonempty subgraphs whose ver- 
tex sets are disjoint. Any graph decomposes into its connected components. 


We end this survey of basic graph concepts with a few more pieces of ter- 
minology: A clique in G is a complete subgraph. An independent set in G 
is an induced subgraph without edges, that is, a subset of the vertex set such 
that no two vertices are connected by an edge of G. A graph is a forest if it 
does not contain any cycles. A tree is a connected forest. Finally, a graph 
G = (V, E) is bipartite if it is isomorphic to a subgraph of a complete bi- 
partite graph, that is, if its vertex set can be written as a union V = V; UV 
of two independent sets. 


The complete bipartite graphs Km,n 
with m + n vertices and mn edges 


The paths P,, with n vertices 


The cycles C, with n vertices 


is a subgraph of 
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® 


Check for 
updates 


Try for yourself — before you read much further — to construct configura- 
tions of points in the plane that determine “relatively few” slopes. For this 
we assume, of course, that the n > 3 points do not all lie on one line. Re- 
call from Chapter 11 on “Lines in the plane” the theorem of Erd6s and de 
Bruijn: the n points will determine at least n different lines. But of course 
many of these lines may be parallel, and thus determine the same slope. 


= n=4 n=5 n=6 O— Tf 
3 slopes 4 slopes 4 slopes 6 slopes 6 slopes 
or 
n=3 We al n=5 n=6 a= % a A little experimentation for small n will 
3 slopes 4 slopes 4 slopes 6 slopes 6 slopes --- probably lead you to a sequence such as 


the two depicted here. 


After some attempts at finding configurations with fewer slopes you might 
conjecture — as Scott did in 1970 — the following theorem. 


Theorem. [fn > 3 points in the plane do not lie on one single line, 
then they determine at least n — 1 different slopes, where equality is 
possible only if n is odd and n > 5. 


Our examples above — the drawings represent the first few configurations 
in two infinite sequences of examples — show that the theorem as stated is 
best possible: for any odd n > 5 there is a configuration with n points that 
determines exactly n — 1 different slopes, and for any other n > 3 we have 
a configuration with exactly n slopes. 
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Three pretty sporadic examples from the 
Jamison—Hill catalogue 


This configuration of n = 6 points 
determines t = 6 different slopes. 


123 4 5 6 


Here a vertical starting direction yields 
To = 123456. 


However, the configurations that we have drawn above are by far not the 
only ones. For example, Jamison and Hill described four infinite families 
of configurations, each of them consisting of configurations with an odd 
number 7 of points that determine only n — 1 slopes (“slope-critical con- 
figurations”). Furthermore, they listed 102 “sporadic” examples that do not 
seem to fit into an infinite family, most of them found by extensive com- 
puter searches. 


Conventional wisdom might say that extremal problems tend to be very 
difficult to solve exactly if the extreme configurations are so diverse and 
irregular. Indeed, there is a lot that can be said about the structure of slope- 
critical configurations (see [2]), but a classification seems completely out 
of reach. However, the theorem above has a simple proof, which has two 
main ingredients: a reduction to an efficient combinatorial model due to 
Eli Goodman and Ricky Pollack, and a beautiful argument in this model by 
which Peter Ungar completed the proof in 1982. 


@ Proof. (1) First we notice that it suffices to show that every “even” set 
of n = 2m points in the plane (m > 2) determines at least n slopes. This 
is so since the case n = 3 is trivial, and for any set of n = 2m+1 > 5 
points (not all on a line) we can find a subset of n — 1 = 2m points, not all 
on a line, which already determines n — 1 slopes. 


Thus for the following we consider a configuration of n = 2m points in the 
plane that determines t > 2 different slopes. 


(2) The combinatorial model is obtained by constructing a periodic se- 
quence of permutations. For this we start with some direction in the plane 
that is not one of the configuration’s slopes, and we number the points 
1,...,m in the order in which they appear in the 1-dimensional projection 
in this direction. Thus the permutation 7) = 123...n represents the order 
of the points for our starting direction. 


Next let the direction perform a counterclockwise motion, and watch how 
the projection and its permutation change. Changes in the order of the 
projected points appear exactly when the direction passes over one of the 
configuration’s slopes. 


But the changes are far from random or arbitrary: By performing a 180° 
rotation of the direction, we obtain a sequence of permutations 


TQ > Wy > Tq +++ M1 OT 


which has the following special properties: 


e The sequence starts with 79 = 123...n and ends with 7, = n...321. 


e The length t of the sequence is the number of slopes of the point con- 
figuration. 


e In the course of the sequence, every pair i < j is switched exactly 
once. This means that on the way from 7) = 123...n to 7, = n...321, 
only increasing substrings are reversed. 
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e Every move consists in the reversal of one or more disjoint increasing 
substrings (corresponding to the one or more lines that have the direc- 
tion which we pass at this point). 


7 = 654321 


m1 = 213546 


To = 123456 


By continuing the circular motion around the configuration, one can view 
the sequence as a part of a two-way infinite, periodic sequence of permuta- 
tions 


Ta] PQ: SP STE PTL Tg es 


where 74, is the reverse of 7; for all 7, and thus 7;+2; = 7; for all i € Z. 


We will show that every sequence with the above properties (and t > 2) 
must have length t > n. 


(3) The proof’s key is to divide each permutation into a “left half” and a 
“right half” of equal size m = 3, and to count the letters that cross the 
imaginary barrier between the left half and the right half. 

Call 7; — 741 a crossing move if one of the substrings it reverses does 
involve letters from both sides of the barrier. The crossing move has order 
d if it moves 2d letters across the barrier, that is, if the crossing string has 
exactly d letters on one side and at least d letters on the other side. Thus in 
our example 

2 = 213:564 —> 265:314 = 73 


is a crossing move of order d = 2 (it moves 1,3,5,6 across the barrier, 
which we mark by “:”), 


652:341 —> 654:321 


Getting the sequence of permutations 


for our small example 


265314 


a 


A crossing move 


> 


a - 
213564 


86 


The slope problem 


An ordinary move 


Wess. 


213564 


is crossing of order d = 1, while for example 
625:314 —> 652:341 


is not a crossing move. 


In the course of the sequence 79 — 71 — --- — 7}, each of the letters 
1,2,...,n has to cross the barrier at least once. This implies that, if the 
orders of the c crossing moves are dj, dz,...,d-, then we have 


ye 2d; = {letters that cross the barrier} > n. 
i=1 


This also implies that we have at least two crossing moves, since a crossing 
move with 2d; = n occurs only if all the points are on one line, i. e. for 
t = 1. Geometrically, a crossing move corresponds to the direction of a 
line of the configuration that has less than m points on each side. 


(4) A touching move is a move that reverses some string that is adjacent to 
the central barrier, but does not cross it. For example, 


T4 = 625:314 —> 652:341 = a5 


is a touching move. Geometrically, a touching move corresponds to the 
slope of a line of the configuration that has exactly m points on one side, 
and hence at most m — 2 points on the other side. 


Moves that are neither touching nor crossing will be called ordinary moves. 
For this 
mT, = 213:546 —> 213:564 = 7 


is an example. So every move is either crossing, or touching, or ordinary, 
and we can use the letters T, C’, O to denote the types of moves. C(d) will 
denote a crossing move of order d. Thus for our small example we get 


T Oo C(2) Oo T C(1) 
TO > TY > 72 > 13 > 14 > 75 > 76; 


or even shorter we can record this sequence as T, O, C(2),O,T, C(1). 


(5) To complete the proof, we need the following two facts: 


Between any two crossing moves, there is at least one touching 
move. 


Between any crossing move of order d and the next touching move, 
there are at least d — 1 ordinary moves. 


In fact, after a crossing move of order d the barrier is contained in a sym- 
metric decreasing substring of length 2d, with d letters on each side of the 
barrier. For the next crossing move the central barrier must be brought into 
an increasing substring of length at least 2. But only touching moves affect 
whether the barrier is in an increasing substring. This yields the first fact. 
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For the second fact, note that with each ordinary move (reversing some 
increasing substrings) the decreasing 2d-string can get shortened by only 
one letter on each side. And, as long as the decreasing string has at least 4 
letters, a touching move is impossible. This yields the second fact. 


If we construct the sequence of permutations starting with the same initial 
projection but using a clockwise rotation, then we obtain the reversed se- 
quence of permutations. Thus the sequence that we do have recorded must 
also satisfy the opposite of our second fact: 


Between a touching move and the next crossing move, of order d, 
there are at least d — 1 ordinary moves. 


(6) The T-O-C-pattern of the infinite sequence of permutations, as derived 
in (2), is obtained by repeating over and over again the T-O-C-pattern of 
length t of the sequence 7) —> --- —+ 7. Thus with the facts of (5) we 
see that in the infinite sequence of moves, each crossing move of order d is 
embedded into a T-O-C-pattern of the type 


T,O,O,...,0,C(d),O,O,...,0, (*) 
eS SS 
>d-1 >d-1 


of length 1 + (d—1) +1+(d-—1) = 2d. 
In the infinite sequence, we may consider a finite segment of length ¢ that 
starts with a touching move. This segment consists of substrings of the 


type (*), plus possibly extra inserted J’s. This implies that its length ¢ 
satisfies 


c 
t > Si 2d, > n, 
i=1 


which completes the proof. 
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Three applications 
of Euler’s formula 


A graph is planar if it can be drawn in the plane R? without crossing edges 
(or, equivalently, on the 2-dimensional sphere S$”). We talk of a plane graph 
if such a drawing is already given and fixed. Any such drawing decomposes 
the plane or sphere into a finite number of connected regions, including 
the outer (unbounded) region, which are referred to as faces. Euler’s for- 
mula exhibits a beautiful relation between the number of vertices, edges 
and faces that is valid for any plane graph. Euler mentioned this result for 
the first time in a letter to his friend Goldbach in 1750, but he did not have 
a complete proof at the time. Among the many proofs of Euler’s formula, 
we present a pretty and “self-dual” one that gets by without induction. It 
can be traced back to von Staudt’s book “Geometrie der Lage” from 1847. 


Euler’s formula. /f G is a connected plane graph with n vertices, 
e edges and f faces, then 
n—-e+f =2. 


@ Proof. Let T C E be the edge set of a spanning tree for G, that is, of a 
minimal subgraph that connects all the vertices of G. This graph does not 
contain a cycle because of the minimality assumption. 

We now need the dual graph G* of G: To construct it, put a vertex into the 
interior of each face of G’, and connect two such vertices of G* by edges that 
correspond to common boundary edges between the corresponding faces. If 
there are several common boundary edges, then we draw several connecting 
edges in the dual graph. (Thus G* may have multiple edges even if the 
original graph G is simple.) 

Consider the collection T* C E* of edges in the dual graph that corre- 
sponds to edges in £\T. The edges in T* connect all the faces, since T 
does not have a cycle; but also J does not contain a cycle, since otherwise 
it would separate some vertices of G inside the cycle from vertices outside 
(and this cannot be, since T' is a spanning subgraph, and the edges of T' and 
of 7* do not intersect). Thus T™ is a spanning tree for G*. 

For every tree the number of vertices is one larger than the number of 
edges. To see this, choose one vertex as the root, and direct all edges 
“away from the root’: this yields a bijection between the non-root ver- 
tices and the edges, by matching each edge with the vertex it points at. 
Applied to the tree T' this yields n = ey + 1, while for the tree 7™ it yields 
f =er~ + 1. Adding both equations we get n+ f = (er +1)+(er«+1) = 
e+ 2. 
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Three applications of Euler’s formula 


Ne 


The five platonic solids 


9 2 
Here the degree is written next to each 
vertex. Counting the vertices of given 
degree yields n2 = 3,n3 = 0,n4 = 1, 
nm = 2. 


me 
oe 


The number of sides is written into each 
region. Counting the faces with a given 
number of sides yields f; = 1, fe = 3, 
fa =1, fo = 1, and f; = 0 otherwise. 


Euler’s formula thus produces a strong numerical conclusion from a geo- 
metric-topological situation: the numbers of vertices, edges, and faces of a 
finite graph G satisfy n — e+ f = 2 whenever the graph is or can be drawn 
in the plane or on a sphere. 

Many well-known and classical consequences can be derived from Euler’s 
formula. Among them are the classification of the regular convex polyhedra 
(the platonic solids), the fact that A’ and ‘3 3 are not planar (see below), 
and the five-color theorem that every planar map can be colored with at 
most five colors such that no two adjacent countries have the same color. 
But for this we have a much better proof, which does not even need Euler’s 
formula — see Chapter 39. 

This chapter collects three other beautiful proofs that have Euler’s formula 
at their core. The first two — a proof of the Sylvester—Gallai theorem, and 
a theorem on two-colored point configurations — use Euler’s formula in 
clever combination with other arithmetic relationships between basic graph 
parameters. Let us first look at these parameters. 


The degree of a vertex is the number of edges that end in the vertex, where 
loops count double. Let n; denote the number of vertices of degree 7 in G. 
Counting the vertices according to their degrees, we obtain 


n= ntnr+ng+n3g+--- (1) 


On the other hand, every edge has two ends, so it contributes 2 to the sum 
of all degrees, and we obtain 


Qe = ny t+2ne+3n3+4n44+--: (2) 


You may interpret this identity as counting in two ways the ends of the 
edges, that is, the edge-vertex incidences. The average degree d of the 


vertices is therefore 


ees 
nr 


Next we count the faces of a plane graph according to their number of sides: 
a k-face is a face that is bounded by k edges (where an edge that on both 
sides borders the same region has to be counted twice!). Let f;, be the 
number of k-faces. Counting all faces we find 


f= jr fe feet (3) 
Counting the edges according to the faces of which they are sides, we get 
2e = fit2fe+3fs+4fat+--- (4) 


As before, we can interpret this as double-counting of edge-face incidences. 
Note that the average number of sides of faces is given by 
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Let us deduce from this — together with Euler’s formula — quickly that the 
complete graph As and the complete bipartite graph (3,3 are not planar. 
For a hypothetical plane drawing of Ks we calculate n = 5, e = (3) = 10, 
thus f =e+2—n=7and f = 2 a es < 3. But if the average number 
of sides is smaller than 3, then the embedding would have a face with at 
most two sides, which cannot be. 


Similarly for K3,3 we get n = 6, e = 9, and f = e + 2 —n = 5, and thus 
f= 53 = = < 4, which cannot be since 3 3 is simple and bipartite, so —_K’s drawn with one crossing 
all its cycles have length at least 4. 


It is no coincidence, of course, that the equations (3) and (4) for the f;’s look 
so similar to the equations (1) and (2) for the n;’s. They are transformed 
into each other by the dual graph construction G — G* explained above. 


From the double counting identities, we get the following important “local” 


consequences of Euler’s formula. : : 
3,3 drawn with one crossing 


Proposition. Let G be any simple plane graph with n > 2 vertices. Then 
(A) G has at most 3n — 6 edges. 
(B) G has a vertex of degree at most 5. 


(C) If the edges of G are two-colored, then there is a vertex of G with at 
most two color-changes in the cyclic order of the edges around the 
vertex. 


Proof. For each of the three statements, we may assume that G is con- 
nected. 


(A) Every face has at least 3 sides (since G' is simple), so (3) and (4) yield 


f= fot fat foto 


and 


2e = 3f3+4fat5fs+--- 
and thus 2e — 3f > 0. Euler’s formula now gives 
3n—6 = 38e—3f > e. 


(B) By part (A), the average degree d satisfies 


= 2e 6n — 12 
n n 


< 6. 


So there must be a vertex of degree at most 5. 
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> 


A\* 


Arrows point to the corners with color 
changes. 


Ls 


(C) Let c be the number of corners where color changes occur. Suppose the 
statement is false, then we have c > 4n corners with color changes, since 
at every vertex there is an even number of changes. Now every face with 
2k or 2k + 1 sides has at most 2k such corners, so we conclude that 


Afs + 4fa+Afs + bo + Ofr + 8fe+:=- 
Af +4 ja+ Bla + Bio lO fy +o 
2(3f3 +4f4+5fs+6fe+7f7+---) 
Aja fat fs + Je fe) 

= 4e—Af 


4n < ec 


I IA IA 


using again (3) and (4). So we have e > n+ f, again contradicting Euler’s 
formula. 


1. The Sylvester—Gallai theorem, revisited 


It was first noted by Norman Steenrod, it seems, that part (B) of the propo- 
sition yields a strikingly simple proof of the Sylvester—Gallai theorem (see 
Chapter 11). 


The Sylvester—Gallai theorem. Given any set of n > 3 points in the 
plane, not all on one line, there is always a line that contains exactly two 
of the points. 


@ Proof. (Sylvester—Gallai via Euler) 

If we embed the plane R? in R® near the unit sphere S? as indicated in 
our figure, then every point in R? corresponds to a pair of antipodal points 
on S?, and the lines in R? correspond to great circles on S?. Thus the 
Sylvester—Gallai theorem amounts to the following: 


Given any set of n > 3 pairs of antipodal points on the sphere, not all on 
one great circle, there is always a great circle that contains exactly two of 
the antipodal pairs. 


Now we dualize, replacing each pair of antipodal points by the correspond- 
ing great circle on the sphere. That is, instead of points +v € S? we 
consider the orthogonal circles given by C, = {a € S? : (x, v) = O}. 
(This C,, is the equator if we consider v as the north pole of the sphere.) 


Then the Sylvester—Gallai problem asks us to prove: 


Given any collection of n > 3 great circles on S*, not all of them passing 
through one point, there is always a point that is on exactly two of the great 
circles. 


But the arrangement of great circles yields a simple plane graph on S?, 
whose vertices are the intersection points of two of the great circles, which 
divide the great circles into edges. All the vertex degrees are even, and they 
are at least 4 — by construction. Thus part (B) of the proposition yields the 
existence of a vertex of degree 4. That’s it! 


Three applications of Euler’s formula 


93 


2. Monochromatic lines 


The following proof of a “colorful” relative of the Sylvester—Gallai theorem 
is due to Don Chakerian. 


Theorem. Given any finite configuration of “black” and “white” points 
in the plane, not all on one line, there is always a “monochromatic” line: 
a line that contains at least two points of one color and none of the other. 


@ Proof. As for the Sylvester—Gallai problem, we transfer the problem to 
the unit sphere and dualize it there. So we must prove: 


Given any finite collection of “black” and “white” great circles on the unit 
sphere, not all passing through one point, there is always an intersection 
point that lies either only on white great circles, or only on black great 
circles. 


Now the (positive) answer is clear from part (C) of the proposition, since 
in every vertex where great circles of different colors intersect, we always 
have at least 4 corners with sign changes. 


3. Pick’s theorem 


Pick’s theorem from 1899 is a beautiful and surprising result in itself, but 
it is also a “classical” consequence of Euler’s formula. For the following, 
call a convex polygon P C R? elementary if its vertices are integral (that 
is, they lie in the /attice Z), but if it does not contain any further lattice 
points. 


Lemma. Every elementary triangle A = conv{p,, p;, Po} C R? has area 
Aja) =<, 


H Proof. Both the parallelogram P with corners py, p,,P2,P1 + P2 — Po 
and the lattice Z? are symmetric with respect to the map 


oO: +> pi t+po- 2, 


which is the reflection with respect to the center of the segment from p, 
to p,. Thus the parallelogram P = AU o(A) is elementary as well, and 
its integral translates tile the plane. Hence {p, — po, P2 — po} is a basis 
of the lattice Z?, it has determinant +1, P isa parallelogram of area 1, and 
A has area 5. (For an explanation of these terms see the box on the next 
page.) 


Theorem. The area of any (not necessarily convex) polygon Q C R? with 
integral vertices is given by 


if 
A(Q) = Nint + g tba — 1, 


where Ning and Nyq are the numbers of integral points in the interior 
respectively on the boundary of Q. 


Pi + P2 — Po 


e@ e@ e 
Nint = 11, nog = 8, so A= 14 
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Lattice bases 


A basis of Z? is a pair of linearly independent vectors e;, €2 such that 
We = {\1e1 + A2e€2: A1, A2 € Z}. 


Let ey = (7) and €g = (ae then the area of the parallelogram 
spanned by e and eg is given by A(e,,e2) = |det(e1,e2)| = 
|det (7S)|. If f: = () and f, = (*) is another basis, then 
there exists an invertible Z-matrix Q with (Gee) =I; )\Q=sice 
QQ-" = (Gs ae and the determinants are integers, it follows that 
| det Q| = 1, and hence | det(f,, f)| = | det(e1, e2)|. Therefore 
all basis parallelograms have the same area 1, since A((6); Gy = il, 


@ Proof. Every such polygon can be triangulated using all the nn lattice 
points in the interior, and all the npq lattice points on the boundary of Q. 
(This is not quite obvious, in particular if Q is not required to be convex, but 
the argument given in Chapter 40 on the art gallery problem proves this.) 


Now we interpret the triangulation as a plane graph, which subdivides the 


plane into one unbounded face plus f — 1 triangles of area }, so 


3 
1 
A(Q) = 5(f-D. 
Every triangle has three sides, where each of the e;,,, interior edges bounds 
two triangles, while the e,4 boundary edges appear in one single triangle 
each. So 3(f—1) = 2e€int+epa and thus f = 2(e—f)—e,¢+3. Also, there 
is the same number of boundary edges and vertices, ey5g = Noa. These two 
facts together with Euler’s formula yield 


f = 2e-f)-eat+3 
= 2(n—2)—Mmat3 = 2nint + Ma —1, 
and thus 


A(Q) = $(f -1) = nine + §M0a —- 1. 
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Cauchy’s rigidity theorem Chapter 14 


Check for 
updates 


A famous result that depends on Euler’s formula (specifically, on part (C) 
of the proposition in the previous chapter) is Cauchy’s rigidity theorem for 
3-dimensional polyhedra. 

For the notions of congruence and of combinatorial equivalence that are 
used in the following we refer to the appendix on polytopes and polyhedra 
in the chapter on Hilbert’s third problem, see page 73. 


Theorem. Jf two 3-dimensional convex polyhedra P and P’ are 
combinatorially equivalent with corresponding facets being congru- 
ent, then also the angles between corresponding pairs of adjacent 
facets are equal (and thus P is congruent to P’). 


The illustration in the margin shows two 3-dimensional polyhedra that are 
combinatorially equivalent, such that the corresponding faces are congru- 
ent. But they are not congruent, and only one of them is convex. Thus the 
assumption of convexity is essential for Cauchy’s theorem! 


@ Proof. The following is essentially Cauchy’s original proof. Assume 
that two convex polyhedra P and P’ with congruent faces are given. We 
color the edges of P as follows: an edge is black (or “positive”) if the 
corresponding interior angle between the two adjacent facets is larger in P’ 
than in P; it is white (or “negative”) if the corresponding angle is smaller 
in P’ than in P. 

The black and the white edges of P together form a 2-colored plane graph 
on the surface of P, which by radial projection, assuming that the origin 
is in the interior of P, we may transfer to the surface of the unit sphere. 
If P and P’ have unequal corresponding facet-angles, then the graph is 
nonempty. With part (C) of the proposition in the previous chapter we find 
that there is a vertex p that is adjacent to at least one black or white edge, 
such that there are at most two changes between black and white edges (in 
cyclic order). 


Now we intersect P with a small sphere S- (of radius €) centered at the 
vertex p, and we intersect P’ with a sphere S! of the same radius ¢ centered 
at the corresponding vertex p’. In S_ and S! we find convex spherical 
polygons @ and Q’ such that corresponding arcs have the same lengths, 
because of the congruence of the facets of P and P’, and since we have 
chosen the same radius e. 
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Cauchy’s rigidity theorem 


Q: 


Now we mark by + the angles of @ for which the corresponding angle 
in Q’ is larger, and by — the angles whose corresponding angle of Q’ is 
smaller. That is, when moving from Q to Q’ the + angles are “opened,” 
the — angles are “closed,” while all side lengths and the unmarked angles 
stay constant. 

From our choice of p we know that some + or — sign occurs, and that in 
cyclic order there are at most two +/— changes. If only one type of signs 
occurs, then the lemma below directly gives a contradiction, saying that one 
edge must change its length. If both types of signs occur, then (since there 
are only two sign changes) there is a “separation line” that connects the 
midpoints of two edges and separates all the + signs from all the — signs. 
Again we get a contradiction from the lemma below, since the separation 
line cannot be both longer and shorter in Q’ than in Q. 


Cauchy’s arm lemma. 
If Q and Q! are convex (planar or spherical) n-gons, labeled as in 
the figure, 


Oe Qn-1 
(04 
qo 2 Opi fos 
a2 
q1 dn 


such that 4;9;4.1 = 4441 holds for the lengths of corresponding edges for 
1<i<n-1, and aq; < d’, holds for the sizes of corresponding angles for 
2<i<n-L, then the “missing” edge length satisfies 


“In S UIns 
with equality if and only if a; = a, holds for all 1. 
It is interesting that Cauchy’s original proof of the lemma was false: a con- 
tinuous motion that opens angles and keeps side-lengths fixed may destroy 
convexity — see the figure! On the other hand, both the lemma and its 


proof given here, from a letter by I. J. Schoenberg to S. K. Zaremba, are 
valid both for planar and for spherical polygons. 


@ Proof. We use induction on n. The case n = 3 is easy: If in a triangle 
we increase the angle 7 between two sides of fixed lengths a and b, then the 
length c of the opposite side also increases. Analytically, this follows from 
the cosine theorem 


e = a*4+b? — 2abcosy 
in the planar case, and from the analogous result 


cosc = cosacosb+sinasinbcosy 


in spherical trigonometry. Here the lengths a,b,c are measured on the 
surface of a sphere of radius 1, and thus have values in the interval [0, z]. 
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Now let n > 4. If for any i € {2,...,n — 1} we have a; = a’, then the 
corresponding vertex can be cut off by introducing the diagonal from q;_ 


to q;4, resp. from qj_; to q,1, with q;_14;11 = _19)41, 80 we are done 
by induction. Thus we may assume a; < a’, for2 <i<n-1. 

Now we produce a new polygon Q* from Q by replacing a,_1 by the 
largest possible angle a*_, < a/,_, that keeps Q* convex. For this we 


replace q,, by g;,, keeping all the other g;, edge lengths, and angles from Q. 


. * a Al 7 * 
If indeed we can choose ay,_, = a;,_, keeping Q* convex, then we get 


adn < UG < ad, using the case n = 3 for the first step and induction 
as above for the second. 


Otherwise after a nontrivial move that yields 


UIn > UIn (1) 


we “get stuck” in a situation where gq, q, and q* are collinear, with 


Goh + UG, = G29 (2) 
Now we compare this Q* with Q’ and find 
In < Edn (3) 
by induction on n (ignoring the vertex q, resp. qj). Thus we obtain 


aoa x 2) =! 
Nr 2 2In-N% 2 24n-UN42 = UIn > UAM> 


where (x) is just the triangle inequality, and all other relations have already 
been derived. 


We have seen an example which shows that Cauchy’s theorem is not true 
for nonconvex polyhedra. The special feature of this example is, of course, 
that a noncontinuous “flip” takes one polyhedron to the other, keeping the 
facets congruent while the dihedral angles “jump.” One can ask for more: 


Could there be, for some nonconvex polyhedron, a continuous 
deformation that would keep the facets flat and congruent? 


It was conjectured that no triangulated surface, convex or not, admits such 
a motion. So, it was quite a surprise when in 1977 — more than 160 years 
after Cauchy’s work — Robert Connelly presented counterexamples: closed 
triangulated spheres embedded in R® (without self-intersections) that are 
flexible, with a continuous motion that keeps all the edge lengths constant, 
and thus keeps the triangular faces congruent. 
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A beautiful example of a flexible sur- 
face constructed by Klaus Steffen: The 
dashed lines represent the nonconvex 
edges in this “cut-out” paper model. 
Fold the normal lines as “mountains” 
and the dashed lines as “valleys.” The 
edges in the model have lengths 5, 10, 
11, 12 and 17 units. 


The rigidity theory of surfaces has even more surprises in store: Idjad 
Sabitov managed to prove that when any such flexing surface moves, the 
volume it encloses must be constant. His proof is beautiful also in its use 
of the algebraic machinery of polynomials and determinants (outside the 
scope of this book). 
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The Borromean rings don’t exist Chapter 15 


Check for 
updates 


The “Borromean rings” — three rings arranged so that no two of them are 
linked, but the configuration cannot be taken apart without breaking one 
of the rings — form a classic artistic symbol, which appeared in the coat 
of arms of the aristocratic Borromeo family since the middle of the 15th 
century. 


The Borromean rings are also one of the most tantalizing and enigmatic 
“impossible figures” of mathematics. They can easily be built as a geomet- 
ric object in such a way that two of the rings are perfectly round circles of 
the same size; it seems, however, that then the third ring is represented by 
an ellipse, at best. Thus it is natural to ask: 


Can the Borromean rings be built from three perfect circles? 


As mathematical objects, the Borromean rings belong to the theory of knots 
and links, which very attractively connects geometry, topology, and com- 
binatorics. We all have a geometric picture of what knots (closed curves 
in space) and links (arrangements of several such curves) look like, and we 
can draw them in the plane. We also have intuitive notions of when two 
knots or links are “the same” (equivalent), when a knot or link is “trivial,” 
when two circles are linked, etc.: The appendix to this chapter provides a 
review of the essential terms and definitions, including the fact that two dia- 
grams present the same link or knot if and only if they can be transformed 
into each other by a finite sequence of “Reidemeister moves.” 


Knot theory as we know it today started in 1867, when the physicist William 
Thomson, now known as Lord Kelvin, came up with his “vortex theory,” 
according to which atoms could be explained as knots in the “ether” back- 
ground of the universe. Kelvin’s theory was immensely popular at the 
time and led to considerable efforts in the enumeration and classification 
of knots and links. Kelvin’s coauthor and colleague, the Scottish physicist 
Peter Guthrie Tait, published the first knot tables in 1876. He displayed and 
discussed the following links: 


SBOE 
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The Borromean rings don’t exist 


Two linked circles 


In this display, No. 15 shows the Borromean Rings, while No. 18 is an 
apparently different link that, however, shares the same characteristics: 
It consists of three closed curves that are pairwise not linked, whereas the 
whole diagram does not seem to come apart, it represents a nontrivial link. 
Tait indeed claimed that the links No. 15 and No. 18 were not equivalent, 
apparently based on the assumption that any alternating diagram of a link 
(where along any string under- and over-crossings alternate) has a mini- 
mal number of crossings among all possible diagrams. This long-standing 
“Tait conjecture” was proved more than 100 years later, by Thistlethwaite, 
Kauffman, and Murasugi in 1987. (Tait’s examples No. 16 and 17 have 
only one component, so they are knots. All four examples fall into a larger 
family that has been described and studied as the “Turk’s head links.’”) 

In 1892, the geometer Hermann Brunn introduced a much more general 
family of objects that we now call Brunnian links: k-component links in 
which any subcollection of k — 1 of the components is trivial. Tait’s links 
No. 15 (the Borromean rings) and No. 18 are examples. 


Back to the Borromean rings: Indeed they cannot be built from three perfect 
circles. The first proof for this appeared in 1987 in a long differential geo- 
metry paper by Michael F. Freedman and Richard Skora. Their beautiful 
geometric idea, “getting movies from spherical domes,” is very powerful: 
It solves the problem not only for the Borromean rings, but shows that any 
Brunnian link built from perfect circles is trivial. It can also be generalized 
to links formed by k-spheres in (2& + 1)-dimensional space. Our presen- 
tation is based on a short unpublished note “Circle links” by Ian Agol. 


Theorem 1. /f a link consists of disjoint perfect circles that are 
pairwise not linked, then the link is trivial. 


@ Proof. Moving each of the circles just a little bit, we may assume that 
they lie in planes that are distinct, no two of the planes are parallel, and 
none of the planes spanned by one of the circles contains the center of a 
second circle. (This first preparatory step is not necessary, but it simplifies 
some later parts of the proof quite a bit.) 

There are several different ways to define what it means that two disjoint 
circles in R® are linked. Let us here use the following: Two circles are 
linked if one of them intersects (and not only touches) the disk spanned by 
the other one exactly once. 

Let the circles be C,C’ C R®, let D, D’ be the flat disks they bound, and 
let H, H’ be the planes they span. If C’ intersects the disk D in one point, 
then this point lies both in D C H as well as on C’ C D’ C Hi’, so in 
particular it lies in the intersection of the two planes H and H’, which is a 
line, L := HH’. As this line lies in the plane H and contains a point 
in the interior of the disk D, it intersects C in exactly two points. The 
circle C’ intersects the plane H once in the interior of D, so there has to be 
a second intersection point, which lies again on the line L, but outside D. 
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We conclude that there are two pairs of intersection points given by CN L 
and C’  L, and these two pairs alternate on the line L. In particular, we 
find in this situation that also C intersects the disk D’ in one point. 


It turns out that this “alternating property” characterizes linked circles: If 
two circles C’, C’ are not linked, then one of them misses (or only touches) 
the disk spanned by the other one. In that case we find fewer than four 
points of CU C’ on the line ZL, or the four points do not alternate. 


For the proof of the theorem we now take a configuration of n circles 
in R® that are pairwise not linked and erect spherical domes above the disks 
spanned by the circles. This entails a bold step into the fourth dimension, 
since we add an extra coordinate. Don’t worry about how to visualize this 
— in the end we will look at these dome functions defined on lines, so all 
arguments can be visualized and verified in planar diagrams. 

The spherical domes are constructed as follows: For any circle C C R? 
with center c and radius r there is a 2-dimensional hemisphere S' C R‘, 
which may be obtained as the graph 


{(x, h(x)) € R?xR: 2 € D} 


of the function 


h: DOR, h(a) = Vr? - |x -— cl? la — cl? + |h(x) — 0|? =r? 


on the closed disk D spanned by the circle C’. The dome S is orthogonal 
above D in the following sense: If we project it to R° by the orthogonal 
projection 7 : R* — R3, (x,t) > 2, that “forgets the last coordinate,” then 
the image of the dome will be the disk D. 


Claim. = /f two disjoint circles C,C’ C R® are not linked, then their 
spherical domes SS’ C IR? xR do not intersect. 


Proof of the Claim. We prove that if the domes S, S’ above the discs D, D’ 
spanned by two disjoint circles C,C’ C R? intersect, then the circles are 
linked. For this, let (a9, to) be a point in the intersection SNS’. As (a9, to) 
lies in S, we get xo € D. Similarly, as (xo,to) lies in S’, we get that 
xo € D’. Hence 7g lies in the line L, and it also lies on DM D’, where both 
“lifting functions” h and h’ are defined. 
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The half-circles above L intersect if and 
only if their end points alternate on the 
line L. 


The lifting functions h, h’ describe spherical domes defined on D resp. D’. 
Restricted to the line L, the functions h and h’ define perfect half-circles, 
with domain of definition DM L resp. D' M L. (This is the crucial point 
in the proof: Above an ellipse one cannot build a dome that restricts to 
half-circle arcs.) 

Since the half-circles above DM L resp. D'N L intersect, their pairs of end 
points S$ L and S’ 1 L alternate on L, as illustrated in the margin. Hence 
the circles C and C’ are linked. This finishes the proof of the claim. 


Back to the configuration of disjoint perfect circles in R® that are pairwise 
not linked. Freedman and Skora’s brilliant idea was to use the disjoint 
domes guaranteed by the claim in order to construct a “movie” that shows 
us how to separate the circles in the link by a continuous motion. For 
this, we identify the original space R?, which contains the link, with the 
slice R® x {0} of the space R® xR that contains the domes; that is, the extra 
coordinate ¢ is interpreted as time, and we start our movie at t = 0 with the 
original link. If we now continuously increase the fourth (time) coordinate, 
then what we see in time slices R® x{t} is a movie in which each of the 
circles shrinks to a point, and then disappears. 


R? x {t} =H; 


R° x {0} = Ho 


Here is a key observation: While a circle shrinks in this movie, the center of 
the circle and the plane spanned by the circle do not change. Furthermore, 
the circles stay disjoint since the domes are disjoint by the claim, and thus 
they remain pairwise non-linked. 

We can stop the shrinking for each circle at some time when the circle is 
so small that it does not any more intersect a plane that is spanned by any 
one of the other circles. Moreover, also the disk spanned by this little circle 
does not intersect any of the other circle planes — neither at the point of 
time where we stop its shrinking, nor at any later time. 

Thus the movie will end with all circles shrunk so far that they have disjoint 
spanning disks: The circles are completely separate, and thus the link is 
trivial. 


In particular, we have just proved that any Brunnian link built from perfect 
circles can be taken apart in a motion which maintains perfect circles along 
the way. It remains an open problem, however, whether each of the circles 
could keep its size in such a motion picture. 
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With Theorem 1, we have established that the Borromean rings cannot be 
built from perfect circles — assuming that we know that the Borromean 
rings form a nontrivial link. Do we know that? It is by no means easy to 
prove rigorously for any knot or link that it is nontrivial ... However, the 
eminent knot theorist Ralph Fox has invented a strikingly simple method to 
achieve this — reportedly he designed it “in an effort to make the subject 
accessible to everyone” while teaching knot theory to undergraduates at 
Haverford College in 1956. Its first published trace can be found in an 
exercise of a 1963 knot theory textbook by Crowell and Fox. Thirty years 
later Ollie Nanyes observed that Fox’s method also solves the problem for 
the Borromean rings. 


Theorem 2. The Borromean rings are nontrivial, and they are also 
not equivalent to Tait’s link No. 18. 


@ Proof. For every n > 2, a Fox n-labeling of a link diagram labels each 
arc of the diagram by an integer modulo n, such that at each crossing the 
two integers a and c of the arcs that end at the crossing and the label b of 
the arc of the overpass satisfy the crossing relation 


a+c = 2b (modn). 


Each link diagram has n trivial n-labelings, which use the same label for 
all the arcs of the diagram, so we are interested in nontrivial labelings, 
which use at least two different labels. For example, any link that consists 
of two disjoint “far away” parts in the plane has at least n? different Fox 
n-labelings. Now we observe a crucial fact: 


Claim. /f two diagrams represent equivalent links, then they have the same 
number of Fox n-labelings. 


As explained in the appendix to this chapter, the diagrams for equivalent 
links are connected by continuous deformations and a finite sequence of 
Reidemeister moves of types I, I, and II; so all we have to check is that 
Reidemeister moves don’t change the number of Fox n-labelings. This is 
apparent from the following sketches, where in each of the separate draw- 
ings all the relations among the labels of different arcs are forced by the 
crossing relations: 


; ‘ ee 
a Jr? 


The crossing relation 
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f: 2a—c Ul 
J 2b—c f a 
xr 2a — b — 
Wh y \ 2a—b 
In particular, for arbitrary labels a, b, and c, the Reidemeister moves of 
type III the crossing relations finally force us to put labels 
xv = 2(2a — b) — (2a —c) = c+ 2a — 20 
before the move and 
y = 2a — (2b—c) =c+2a— 2b 
after the move. This establishes the Claim! 
a = b Now we simply have to count the labelings. The interesting observations 
= will occur for odd n > 3. 
ry For the Borromean rings we claim that all Fox n-labelings are trivial if 
ve nm > 3 is odd: If in the standard diagram for the Borromean rings the outer 
arcs get the labels a,b, and c (as sketched in the margin), then the outer 
crossings force the inner arcs to have labels 2b — a, 2c — b, and 2a — c, and 
Daas dy at the inner crossings of the diagram we need that 


Cc 


Labels for the Borromean rings 


a 
Uv 
coh 
a on 
a <b 


Labels for Tait’s link No. 18 


bo 
io) 
| Cc 


QqQ 


2(2b—a) =c+(2a—c), 2(2c—b) = a+(2b—a), 2(2a—c) = b+(2c—d), 


that is 4a = 4b = 4c, and hence a = b = c(modn), as nis odd. (For every 
even n > 2, nontrivial labelings exist.) In particular, the Borromean rings 
have only the trivial Fox 3-labelings or 5-labelings. 


For Tait’s link No. 18, a very similar calculation, with the labels a, b, c, d, e, 
and f assigned to the outer arcs, leads to inner labels 2a — b, 2b —c, 2c — d, 
etc., and then finally to the conditions 


d=4(b—c), b—-e=4(c—d), c— f =4(d—e), ... (modn). 


a 


For n = 3, this yields a—b+c—d=0,b—c+d-—e =O, etc., and 
we quickly derive that a = b =--- = f, so again there are only the trivial 
3-labelings. 

However, for n = 5 we find that we have to solve the equations a + b = 
c+d,b+c=d+e, etc., and this leads us to the solutions with arbitrary 
a =c=eandb =d = f (and no others). Thus there are 5? = 25 Fox 
5-labelings for this link. 


The trivial three component link clearly has n° Fox n-labelings, that is, it 
has 27 Fox 3-labelings and 125 Fox 5-labelings. 


Thus the Borromean rings, Tait’s link No. 18, and the trivial link with three 
components have different numbers of Fox 5-labelings (5, 25, and 125, 
respectively), so they are nonequivalent links. 
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Appendix: Basic notions on knots and links 


Topologists define a knot as the image of a continuous embedding of a 
circle in R®; a differential geometer might add that we are not interested 
in “wild” knots, but only in “tame” ones that are smooth curves. A link is 
obtained from a smooth embedding of a disjoint union of disjoint circles, 
known as the components of the link. Knots and links can also be treated 
as combinatorial objects, as any projection of a smooth knot or link to the 
plane along a sufficiently “generic” direction leads to a representation by a 
diagram, that is, a drawing of the knot or link by smooth curves in the plane 
with only a finite number of crossings, at which exactly two different parts 
of the knot or link cross — and where we indicate an over- or under-pass 
by a “trompe I’ ceil’’-like fashion. 

When are two knots, or two links, “the same”? Topologically, two links 
L and L’ are defined to be equivalent if there is an orientation-preserving 
homeomorphism between (R?, L) and (R*, L’), that is, a continuous and 
bijective map h : R® — R® with a continuous inverse such that h(L) = L’. 
Geometrically, we can describe this by a continuous deformation of space 
that moves L to L’. Such deformations might be hard to describe and 
analyze, but in 1926 Kurt Reidemeister proved a very useful combinato- 
rial characterization: Two diagrams drawn in the plane describe equivalent 
knots or links if and only if one can be obtained from the other by con- 
tinuous deformations and a finite number of local operations that are now 
known as the Reidemeister moves of types I, II, and IIL. 


laa, ic x 


The “if” part of Reidemeister’s theorem is quite obvious. For the “only if” 
direction one studies a smooth deformation of L to L’, where also the direc- 
tions and curvatures along the curves are required to change continuously. 
If we then maintain a “general position” projection to a plane, this will give 
us a continuous deformation of one diagram to the other with only a finite 
number of Reidemeister-type moves on the way. 

A knot is trivial if it is equivalent to a perfect (geometric) circle in R3, or 
equivalently, if it admits a spanning disk whose interior is disjoint from the 
knot. More generally, a link with k components is trivial if it is equivalent 
to a link formed by & “far apart” circles that have disjoint spanning disks. 


Ii 


/ 


Ll 
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Touching simplices 


How many d-dimensional simplices can be positioned in R® so that 
they touch in such a way that all their pairwise intersections are 
(d — 1)-dimensional? 


This is an old and very natural question. We shall call f(d) the answer to 
this problem, and record f(1) = 2, which is trivial. For d = 2 the configu- 
ration of four triangles in the margin shows f(2) > 4. There is no similar 
configuration with five triangles, because from this the dual graph construc- 
tion, which for our example with four triangles yields a planar drawing 
of K4, would give a planar embedding of Ks, which is impossible (see 
page 91). Thus we have 

f(2) =4. 
In three dimensions, f(3) > 8 is quite easy to see. For that we use the con- 
figuration of eight triangles depicted on the right. The four shaded triangles 
are joined to some point x below the “plane of drawing,” which yields four 
tetrahedra that touch the plane from below. Similarly, the four white trian- 
gles are joined to some point y above the plane of drawing. So we obtain a 
configuration of eight touching tetrahedra in R°, that is, f(3) > 8. 
In 1965, Baston wrote a book proving f(3) < 9, and in 1991 it took Zaks 
another book to establish 

f(3) =8. 
With f(1) = 2, f(2) = 4 and f(3) = 8, it doesn’t take much inspiration to 
arrive at the following conjecture, first posed by Bagemihl in 1956. 


Conjecture. The maximal number of pairwise touching d-simplices in a 
configuration in R¢ is 


f(d) = 2%. 


The lower bound, f(d) > 2%, is easy to verify “if we do it right.” This 
amounts to a heavy use of affine coordinate tranformations, and to an in- 
duction on the dimension that establishes the following stronger result, due 
to Joseph Zaks [4]. 


Theorem 1. For every d > 2, there is a family of 2% pairwise touching 
d-simplices in IR@ together with a transversal line that hits the interior of 
every single one of them. 
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@ Proof. For d = 2 the family of four triangles that we had considered 
does have such a transversal line. Now consider any d-dimensional con- 
figuration of touching simplices that has a transversal line ?. Any nearby 
parallel line ¢’ is a transversal line as well. If we choose ¢’ and ¢ parallel 
and close enough, then each of the simplices contains an orthogonal 
(shortest) connecting interval between the two lines. Only a bounded part 
of the lines @ and @’ is contained in the simplices of the configuration, and 
we may add two connecting segments outside the configuration, such that 
the rectangle spanned by the two outside connecting lines (that is, their con- 
vex hull) contains all the other connecting segments. Thus, we have placed 
a “ladder” such that each of the simplices of the configuration has one of 
the ladder’s steps in its interior, while the four ends of the ladder are outside 
the configuration. 

Now the main step is that we perform an (affine) coordinate transformation 
that maps R? to R®, and takes the rectangle spanned by the ladder to the 
rectangle (half-square) as shown in the figure below, given by 


R' = {(x1,%2,0,...,0)? :-1 <2, <0;-1 < a < 1}. 


Thus the configuration of touching simplices =! in R? which we obtain 
has the x-axis as a transversal line, and it is placed such that each of the 
simplices contains a segment 


S*(a) = {(a, £2,0,...,0)7 vol S&S 1} 


in its interior (for some a with —1 < a < 0), while the origin 0 is outside 
all simplices. 

Now we produce a second copy »? of this configuration by reflecting the 
first one in the hyperplane given by 7, = x2. This second configuration 
has the x2-axis as a transversal line, and each simplex contains a segment 


Sa) = (i, 8,0)...,0) 7=1<Sei = 1} 


in its interior, with —1 < 8 < 0. But each segment $1(q) intersects each 
segment $7(), and thus the interior of each simplex of =" intersects each 
simplex of =? in its interior. Thus if we add a new (d + 1)-st coordinate 
Ya+1, and take & to be 


{conv(P; U {—ea+1}): P; € E"} U {conv(P; U {ea+1}) : Pj € D7}, 


then we get a configuration of touching (d + 1)-simplices in R¢++. Fur- 
thermore, the antidiagonal 


A = {(z,-2,0,...,0)? :2¢eR} C R? 


intersects all segments $1(a) and S?(3). We can “tilt” it a little, and obtain 


a line 
Le = {(a,—x,0,...,0,ea)7 : x €R} ic Re, 


which for all small enough € > 0 intersects all the simplices of ©. This 
completes our induction step. 
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In contrast to this exponential lower bound, tight upper bounds are harder 
to get. A naive inductive argument (considering all the facet hyperplanes in 
a touching configuration separately) yields only 

fd) < 3(d4+1), 
and this is quite far from the lower bound of Theorem |. However, Micha 
Perles found the following “magical” proof for a much better bound. 


Theorem 2. For all d > 1, we have f(d) < 271. 


@ Proof. Given a configuration of 7 touching d-simplices P,, P2,..., P, 
in R?, first enumerate the different hyperplanes H,, H2,...,H, spanned 
by facets of the P;, and for each of them arbitrarily choose a positive 
side H;*, and call the other side H;. 

For example, for the 2-dimensional configuration of r = 4 triangles de- 
picted on the right we find s = 6 hyperplanes (which are lines for d = 2). 
From these data, we construct the B-matrix, an r x s matrix with entries in 
{+1, —1, 0}, as follows: 


+1 if P; has a facet in Hj, and P; C H, 
By= —1 if P; hasa facet in H;, and P; C H,, 
O if P; does not have a facet in H,;. 


For example, the 2-dimensional configuration in the margin gives rise to 
the matrix 


iw i * « @ 
414 i 0 6 8 
e= | 1 2. ¢ Tr Oe oO 
tt <i @. & 4 


Three properties of the B-matrix are worth recording. First, since every 
d-simplex has d + 1 facets, we find that every row of B has exactly d+ 1 
nonzero entries, and thus has exactly s — (d+1) zero entries. Secondly, we 
are dealing with a configuration of pairwise touching simplices, and thus 
for every pair of rows we find one column in which one row has a +1 entry, 
while the entry in the other row is —1. That is, the rows are different even 
if we disregard their zero entries. Thirdly, the rows of B “represent” the 
simplices P;, via 


P= [) #9 [) Fy. (*) 


PBy=l1 7: By=-1 


Now we derive from B a new matrix C, in which every row of B is replaced 
by all the row vectors that one can generate from it by replacing all the zeros 
by either +1 or —1. Since each row of B has s — d — 1 zeros, and B has r 
rows, the matrix C has 2°~4¢~!r rows. 
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The first row of the C-matrix represents 
the shaded triangle, while the second 
row corresponds to an empty intersec- 
tion of the halfspaces. The point x leads 
to the vector 


(1 -1 1 #1 -1 #1) 


that does not appear in the C-matrix. 


For our example, this matrix C’ is a 32 x 6 matrix that starts 


1 1 1 1 1 
1 1 1 1 1 -l 
1 1 ed 1 1 
il 1 1 -l 1 -l 
1 -l 1 1 1 1 
1 -l 1 1 1 -l 
C= 1 -1 1 -l 1 1]? 
1) =, t. = 1 1 -l 
al 1. 1 1 1 1 
—1 1 1 


—1 


where the first eight rows of C’' are derived from the first row of B, the 
second eight rows come from the second row of B, etc. 

The point now is that all the rows of C are different: If two rows are derived 
from the same row of B, then they are different since their zeros have been 
replaced differently; if they are derived from different rows of B, then they 
differ no matter how the zeros have been replaced. But the rows of C' are 
+1-vectors of length s, and there are only 2° different such vectors. Thus 
since the rows of C are distinct, C' can have at most 2° rows, that is, 


gs-d- lpn 98, 


However, not all possible +1-vectors appear in C’, which yields a strict 
inequality 2°~¢-!r < 2%, and thus r < 2¢+!. To see this, we note that 
every row of C’ represents an intersection of halfspaces — just as for the 
rows of B before, via the formula (*). This intersection is a subset of the 
simplex P;, which was given by the corresponding row of B. Let us take 
a point 2 € R@ that does not lie on any of the hyperplanes H. j» and not in 
any of the simplices P;. From this z we derive a +1-vector that records for 
each j whether # € Hy orz € Hf; . This +1-vector does not occur in C, 
because its halfspace intersection according to (*) contains x and thus is 
not contained in any simplex P;. 
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Every large point set 
has an obtuse angle 


Around 1950 Paul Erdés conjectured that every set of more than 27 points 
in R? determines at least one obtuse angle, that is, an angle that is strictly 
greater than 5. In other words, any set of points in IR? which only has acute 
angles (including right angles) has size at most 2¢. This problem was posed 
as a “prize question” by the Dutch Mathematical Society — but solutions 


were received only for d = 2 and ford = 3. 

For d = 2 the problem is easy: The five points may determine a convex 
pentagon, which always has an obtuse angle (in fact, at least one angle of 
at least 108°). Otherwise we have one point contained in the convex hull 
of three others that form a triangle. But this point “sees” the three edges of 
the triangle in three angles that sum to 360°, so one of the angles is at least 
120°. (The second case also includes situations where we have three points 
on a line, and thus a 180° angle.) 


Unrelated to this, Victor Klee asked a few years later — and Erdés spread 
the question — how large a point set in R@ could be and still have the 
following “antipodality property”: For any two points in the set there is a 
strip (bounded by two parallel hyperplanes) that contains the point set, and 
that has the two chosen points on different sides on the boundary. 

Then, in 1962, Ludwig Danzer and Branko Griinbaum solved both prob- 
lems in one stroke: They sandwiched both maximal sizes into a chain of 
inequalities, which starts and ends in 2. Thus the answer is 2% both for 
Erdés’ and for Klee’s problem. 


In the following, we consider (finite) sets SC R¢ of points, their convex 
hulls conv(,$), and general convex polytopes Q C R*. (See the appendix 
on polytopes on page 73 for the basic concepts.) We assume that these sets 
have the full dimension d, that is, they are not contained in a hyperplane. 
Two convex sets touch if they have at least one boundary point in common, 
while their interiors do not intersect. For any set Q C R®@ and any vector 
s € R¢ we denote by Q+ s the image of Q under the translation that moves 
0 to s. Similarly, Q — s is the translate obtained by the map that moves s 
to the origin. 

Don’t be intimidated: This chapter is an excursion into d-dimensional 
geometry, but the arguments in the following do not require any “high- 
dimensional intuition,” since they all can be followed, visualized (and thus 
understood) in three dimensions, or even in the plane. Hence, our figures 
will illustrate the proof for d = 2 (where a “hyperplane” is just a line), and 
you could create your own pictures for d = 3 (where a “hyperplane” is 
a plane). 
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Ai + 8; 


Theorem 1. For every d, one has the following chain of inequalities: 


(1) 
2 < max {#5S'| $ Cc R‘, <{( 2,43, 3%) < 5 for every {8:, 8;, 8z} Cc S} 


(2) S C R¢such that for any two points {s;,8;} C S 
max ¢ #5 | there is a strip S(i, j) that contains S, with s; and a 
lying in the parallel boundary hyperplanes of S(i,j) 


IAS 


S C R* such that the translates P — s;, 8; € S, of 
the convex hull P := conv(S)) intersect in a common 
point, but they only touch 


ie 


i) max {#s 


dimensional convex polytope Q © R¢ touch pairwise 


S C R*such that the translates Q* + 8; of some 
= max 4 #5 | d-dimensional centrally symmetric convex polytope 
Q* C R®@ touch pairwise 


. eine {#5 | S C R¢ such that the translates Q + s; of some d- \ 


@ Proof. We have six claims (equalities and inequalities) to verify. Let’s 
get going. 

(1) Take S := {0,1}¢ to be the vertex set of the standard unit cube in R¢, 
and choose s;, 8;, 8; € S. By symmetry we may assume that s; = 0 is 
the zero vector. Hence the angle can be computed from 


(Si; Sk) 
COS I(S;, 57, Sz 
(685158) ~ Te To 
which is clearly nonnegative. Thus 9 is a set with |S| = 27 that has no 


obtuse angles. 


(2) If S contains no obtuse angles, then for any s;,s; € S we may define 
H;; +8; and H;; +8; to be the parallel hyperplanes through s; resp. s; that 
are orthogonal to the edge [s;, s;]. Here Hi; = {x € R®@: (x, s;-s;) = 0} 
is the hyperplane through the origin that is orthogonal to the line through 
s; and s;, and H;; + s; = {a +s, : x © H;j;} is the translate of H;; 
that passes through s;, etc. Hence the strip between H;; + s; and H;; +s; 
consists, besides s; and s,;, exactly of all the points x € IR? such that the 
angles <(s;,s;,) and <(s;,8;,a) are nonobtuse. Thus the strip contains 
all of S. 


(3) P is contained in the halfspace of H;; + s; that contains s; if and only 
if P — s, is contained in the halfspace of H;; that contains s; — s;: A prop- 
erty “an object is contained in a halfspace” is not destroyed if we translate 
both the object and the halfspace by the same amount (namely by —s;). 
Similarly, P is contained in the halfspace of H;; + s; that contains s; if 
and only if P — s; is contained in the halfspace of H;,; that contains s ; — ;. 


Putting both statements together, we find that the polytope P is contained 
in the strip between H;; + 8; and H;; +s; if and only if P—s; and P—s; 
lie in different halfspaces with respect to the hyperplane H;;;. 
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This correspondence is illustrated by the sketch in the margin. 
Furthermore, from s; € P = conv(S’) we get that the origin 0 is contained 
in all the translates P — s; (s; € S). Thus we see that the sets P — s; 
all intersect in 0, but they only touch: their interiors are pairwise disjoint, 
since they lie on opposite sides of the corresponding hyperplanes H;;;. 


(4) This we get for free: “the translates must touch pairwise” is a weaker 
condition than “they intersect in a common point, but only touch.” 
Similarly, we can relax the conditions by letting P be an arbitrary convex 
d-polytope in R¢. Furthermore, we may replace S by —S. 


(5) Here “>” is trivial, but that is not the interesting direction for us. We 
have to start with a configuration S C R¢@ and an arbitrary d-polytope 
Q C R¢ such that the translates Q + s; (s; € S) touch pairwise. The 
claim is that in this situation we can use 


Q* = {$(a@—y) eR’: a2,yeQ} 


instead of Q. But this is not hard to see: First, Q* is d-dimensional, convex, 
and centrally symmetric. One can check that @* is a polytope (its vertices 
are of the form $(q; —q; ), for vertices q;, q ; of Q), but this is not important 
for us. 

Now we will show that Q + s; and @ + 8; touch if and only if Q* + s; and 
@* + s; touch. For this we note, in the footsteps of Minkowski, that 


(Q*+8i)N(Q* + 8;) #4 @ 
G59; 95,9; €Q: 


(ai — a) +8; = 5(aj — 95) + 5; 


WwW 


WwW 


G59; 595,49; €Q: 5(4, +44) + 8i = 5 (G5 + a7) + 8; 


—= 
= 
<= 4q,,4;€9:4,;+ 5 =49;+ 8; 
= (Q+ 8i)N(Q+ 83) #2, 


where in the third (and crucial) equivalence “<=>” we use that every q € Q 
au be written as g = $(q + q) to get “<=”, and that Q is convex and thus 
3(95 + a4), 5(a) +47) € Q to see “>”. 

Thus the passage from @ to Q* (known as Minkowski symmetrization) pre- 
serves the property that two translates Q + s; and @ + s; intersect. That is, 
we have shown that for any convex set Q, two translates 2 + s; and Q+s,; 
intersect if and only if the translates Q* + s; and (* + s; intersect. 


The following characterization shows that Minkowski symmetrization also 
preserves the property that two translates touch: 


Q + 8; and Q + 8; touch if and only if they intersect, while Q + 8; 
and Q + s; + €(s; — 8;) do not intersect for any € > 0. 


(6) Assume that Q* + s; and Q* + s; touch. For every intersection point 


@ € (Q* + 8:1) (Q* + 8;) 


Ay; + 8% 


Ai + 8; 
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we have 
xe—s,€Q* and r—s;,€Q", 


thus, since Q* is centrally symmetric, 


8,-“=—(x%-— 8;) € Q", 
and hence, since Q* is convex, 
3(8i — $j) = 3 ((@— 8;) + (si —@)) € Q*. 
We conclude that $(S; +,) is contained in Q* +s; for all i. Consequently, 
for P := conv(S) we get 
P; =4(P+8;) =conv {3(si + 8;): 8, € S}CQ*+5,;, 
which implies that the sets P; = $(P + s;) can only touch. 


Finally, the sets P; are contained in P, because all the points s;, s; and 
$(8; + 8;) are in P, since P is convex. But the P; are just smaller, scaled, 
translates of P, contained in P. The scaling factor is 3, which implies that 


1 
vol(P;) = 5a VOU(P), 
since we are dealing with d-dimensional sets. This means that at most 24 
sets P; fit into P, and hence || < 27. 
This completes our proof: the chain of inequalities is closed. 


... but that’s not the end of the story. Danzer and Griinbaum asked the 
following natural question: 


What happens if one requires all angles to be acute rather than 
just nonobtuse, that is, if right angles are forbidden? 


They constructed configurations of 2d — 1 points in R? with only acute 
angles, conjecturing that this may be best possible. Griinbaum proved that 
this is indeed true for d < 3. But twenty-one years later, in 1983, Paul 
Erd6és and Zoltan Fiiredi showed that the conjecture is false — quite dra- 
matically, if the dimension is high! Their proof is a great example for the 
power of probabilistic arguments; see Chapter 45 for an introduction to the 
“probabilistic method.” Our version of the proof uses a slight improvement 
in the choice of the parameters due to our reader David Bevan. 


Theorem 2. For every d > 2, there is a set S C {0,1}4 of 2\48(4)"| 
points in R@ (vertices of the unit d-cube) that determine only acute angles. 


In particular, in dimension d = 34 there is a set of 72 > 2-34 — 1 points 
with only acute angles. 


M@ Proof. Set m = Be (4) “|, and pick 3m vectors 


x(1),2(2),...,2(8m) € {0,1}4 


by choosing all their coordinates independently and randomly, to be either 
0 or 1, with probability $ for each alternative. (You may toss a perfect coin 
3md times for this; however, if d is large you may get bored by this soon.) 
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We have seen above that all angles determined by 0/1-vectors are nonob- 
tuse. Three vectors (i), v(j), (k) determine a right angle with apex 2(/) 
if and only if the scalar product (a(i) — a(j),#(k) — a(7)) vanishes, that 
is, if we have 


x(t)e—a(j)e=0 or a(k)e—x(j)e =0 for each coordinate &. 


We call (i,j,k) a bad triple if this happens. (If x(i) = x(j) or x(j) = 
a(k), then the angle is not defined, but also then the triple (i, 7,/) is 
certainly bad.) 


The probability that one specific triple is bad is exactly (3)*: Indeed, it 
will be good if and only if, for one of the d coordinates @, we get 


either x(t)e = u(k)e =0, u(je =1, 
or x(i)e = u(k)e = i, u(je = 0. 


This leaves us with six bad options out of eight equally likely ones, and a 
triple will be bad if and only if one of the bad options (with probability 3) 


happens for each of the d coordinates. 


The number of triples we have to consider is ar), since there are 2) 


sets of three vectors, and for each of them there are three choices for the 
apex. Of course the probabilities that the various triples are bad are not 
independent: but linearity of expectation (which is what you get by averag- 
ing over all possible selections; see the appendix) yields that the expected 


number of bad triples is exactly aC ) ele This means — and this is the 


point where the probabilistic method shows its power — that there is some 


: d : 
choice of the 3™m vectors such that there are at most aC) (3) bad triples, 
where 


m. d m)3 d 
3(3") (G)° < 83> (Z)" = m3 


ale 
a” 
iw) 
— 
HR 1Oo 
ar 
Q 
IA 
2 


by the choice of m. 
But if there are not more than m bad triples, then we can remove m of the 
3m vectors a#(7) in such a way that the remaining 2m vectors don’t contain 
a bad triple, that is, they determine acute angles only. 


The “probabilistic construction” of a large set of 0/1-points without right 
angles can be easily implemented. David Bevan has thus constructed a set 
of 31 0/1-points in dimension d = 15 that determines only acute angles. 
Very recently, Balazs Gerencsér and Viktor Harangi, building on ideas of an 
anonymous Ukrainian enthusiast, managed to construct such “acute-angled 
sets” of size 24—! + 1 for all dimensions d, which however do not anymore 
consist of 0/1 vectors. As we have seen above, the size 2¢~1 + 1 is optimal 
up to a factor of 2. 


Appendix: Three tools from probability 


Here we gather three basic tools from discrete probability theory which 
will come up several times: random variables, linearity of expectation and 
Markov’s inequality. 
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Let (Q, p) be a finite probability space, that is, 0 is a finite set and p = Prob 
is a map from (2 into the interval [0,1] with }>.,¢q p(w) = 1. A random 
variable X on Q is a mapping X : 2 —+ R. We define a probability space 
on the image set X ({2) by setting p(X = x) = }/ x.) p(w). A simple 
example is an unbiased dice (all p(w) = a) with X = “the number on top 
when the dice is thrown.” 


The expectation E.X of X is the average to be expected, that is, 


we 


Now suppose X and Y are two random variables on 2, then the sum X + Y 
is again a random variable, and we obtain 


E(X+Y) = So pw)(Xw)+¥()) 


I 


S~ p(w) X(w) +S pw)¥w) = EX+EY. 


Clearly, this can be extended to any finite linear combination of random 
variables — this is what is called the linearity of expectation. Note that it 
needs no assumption that the random variables have to be “independent” 
in any sense! 


Our third tool concerns random variables X which take only nonnegative 
values, shortly denoted X > 0. Let 


Prob(X >a) = xy p(w) 
wiX(w)>a 


be the probability that X is at least as large as some a > 0. Then 


BX = > plw)Xw) + SD plw)Xw) > a SD pw), 


wiX(w)>a w:iX (w)<a wiX(w)>a 
and we have proved Markov’s inequality 


EX 
Prob(X >a) < —. 
a 
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Borsuk’s conjecture 


Karol Borsuk’s paper “Three theorems on the n-dimensional euclidean 
sphere” from 1933 is famous because it contained an important result 
(conjectured by Stanistaw Ulam) that is now known as the Borsuk—Ulam 
theorem: 


Every continuous map f : S4 — R¢ maps two antipodal points of 
the sphere S“ to the same point in R¢. 


We will see the full power of this theorem in a graph theory application in 
Chapter 43. The paper is famous also because of a problem posed at its 
end, which became known as Borsuk’s Conjecture: 


Can every set S C R® of bounded diameter diam(S) > 0 be 
partitioned into at most d + 1 sets of smaller diameter? 


The bound d + 1 is best possible: If S is a regular d-dimensional simplex, 
or just the set of its d + 1 vertices, then no part of a diameter-reducing 
partition can contain more than one of the simplex vertices. If f(d) denotes 
the smallest number such that every bounded set S C R¢@ has a diameter- 
reducing partition into f(d) parts, then the example of a regular simplex 
establishes 

f(i@) = d+. 


Borsuk’s conjecture was proved for the case when S is a sphere (by Borsuk 
himself), for smooth bodies S (using the Borsuk—Ulam theorem), for d < 
3, ...but the general conjecture remained open. The best available upper 
bound for f(d) was established by Oded Schramm, who showed that 


fd = (29)" 


for all large enough d. This bound looks quite weak compared with the 
conjecture “f(d) = d+ 1,” but it suddenly seemed reasonable when Jeff 
Kahn and Gil Kalai dramatically disproved Borsuk’s conjecture in 1993. 
Sixty years after Borsuk’s paper, Kahn and Kalai proved that 


f(d) > (1.2)%4 


holds for large enough d, making judicious use of a combinatorial-geometric 
method of Peter Frankl and Richard Wilson. 
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A. Nilli 


=> 


= —=1. 2 = 
1 1-1 
1 1-1 

-1 -1 1-1 
1 1-1 


A Book version of the Kahn—Kalai proof was provided by A. Nilli: Brief 
and self-contained, it yields an explicit counterexample to Borsuk’s conjec- 
ture in dimension d = 946. We present here a modification of this proof, 
due to Andrei M. Raigorodskii and to Bernulf Weifbach, which reduces the 
dimension to d = 561, and even to d = 560. Using a novel method, involv- 
ing special graphs, Andriy V. Bondarenko lowered this to d = 65. In fact, 
he showed that f(d) > d+ 1 holds for every d > 65. His method, however, 
does not yield an exponential lower bound in d. The current “record” is 
d = 64, due to Thomas Jenrich. 


Theorem. Let q = p™ be a prime power, n := 4q — 2, and d := (3) = 
(2g — 1)(4q — 3). Then there is a set. S C {+1, —1}4 of 2”? points in R4 
such that every partition of S, whose parts have smaller diameter than S, 


has at least 
gn—2 


parts. For q = 9 this implies that the Borsuk conjecture is false in dimen- 
sion d = 561. Furthermore, f(d) > (1.2)¥4 holds for all large enough d. 


@ Proof. The construction of the set S proceeds in four steps. 


(1) Let q be a prime power, set n = 4q — 2, and let 
Q= {« € {+1,-1}": 2, =1, #{i: 2; = —-1} is even. 


This Q is a set of 2”~? vectors in R”. We will see that (a, y) = 2 (mod 4) 
holds for all vectors x,y € Q. We will call x, y nearly-orthogonal if 
\(x, y)| = 2. We will prove that any subset Q’ C Q which contains no 


nearly-orthogonal vectors must be “small”: |Q’| < a (eo 


Vectors, matrices, and scalar products 


In our notation all vectors x, y, ... are column vectors; the transposed 
vectors «7, y?,... are thus row vectors. The matrix product aa’ is 
a matrix of rank 1, with (ax");; = Pyles 

If x, y are column vectors, then their scalar product is 


(vy) = Dias, = ay. 


We will also need scalar products for matrices X,Y € R"*” which 
can be interpreted as vectors of length n?, and thus their scalar 


product is 
(X,Y) — ST tuys: 
aj 
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(2) From Q, we construct the set 
R = {aa":2€Q} 


of 2"—? symmetric n x n matrices of rank 1. We interpret them as vectors 
with n? components, R C R”. We will show that there are only acute 
angles between these vectors: they have positive scalar products, which are 
at least 4. Furthermore, if R’ C R contains no two vectors with minimal 
scalar product 4, then |R’| is “small”: |R’| < xo, eras? 


(3) From R, we obtain the set of points in RG) whose coordinates are the 
subdiagonal entries of the corresponding matrices: 


S = {(xa");,;:a2a7 € R}. 


Again, S consists of 2”~? points. The maximal distance between these 
points is precisely obtained for the nearly-orthogonal vectors x,y € Q. 
We conclude that a subset 5’ C S of smaller diameter than S must be 
“sina: |S |<, (" ): 
(4) Estimates: From (3) we see that one needs at least 
94q—-4 
26) = eg 
Aq—3 
Se 


parts in every diameter-reducing partition of S. Thus 
f(d) = max{g(q),d+1} for d = (2q—1)(4q — 3). 


Therefore, whenever we have g(q) > (2¢— 1)(4q — 3) +1, then we have a 
counterexample to Borsuk’s conjecture in dimension d = (2q— 1)(4q—3). 
We will calculate below that g(9) > 562, which yields the counterexample 
in dimension d = 561, and that 


€ 27\% 
g(q) > aa (3) , 


which yields the asymptotic bound f(d) > (1.2)¥4 for d large enough. 


Details for (1): We start with some harmless divisibility considerations. 


Lemma. The function P(z) = 


yields integer values for all integers z. The integer P(z) is divisible by p if 
and only if z is not congruent to 0 or 1 modulo q. 


) is a polynomial of degree q — 2. It 


@ Proof. For this we write the binomial coefficient as 


z—2 z—2)(z2-—3)---(2-q4tl1 
pein (C27) ee ig 
g-2) © (g=2)(g=3) 
and compare the number of p-factors in the denominator and in the 
numerator. The denominator has the same number of p-factors as (q — 2)!, 
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Claim. [fa = b 4 0(modq), then 
a and b have the same number of p- 
factors. 

H Proof. We have a = b+ sp”, where 
b is not divisible by p™ = q. So every 
power p* that divides b satisfies k < m, 
and thus it also divides a. The statement 


is symmetric in a and b. 


or as (q — 1)!, since gq — 1 is not divisible by p. Indeed, by the claim in the 
margin we get an integer with the same number of p-factors if we take any 
product of q — 1 integers, one from each nonzero residue class modulo q. 
Now if z is congruent to 0 or 1 (mod q), then the numerator is also of this 
type: All factors in the product are from different residue classes, and the 
only classes that do not occur are the zero class (the multiples of g), and the 
class either of —1 or of +1, but neither +1 nor —1 is divisible by p. Thus 
denominator and numerator have the same number of p-factors, and hence 
the quotient is not divisible by p. 

On the other hand, if z 4 0,1 (mod q), then the numerator of (*) contains 
one factor that is divisible by g = p”’. At the same time, the product has no 
factors from two adjacent nonzero residue classes: one of them represents 
numbers that have no p-factors at all, the other one has fewer p-factors 
than gq = p’. Hence there are more p-factors in the numerator than in the 
denominator, and the quotient is divisible by p. 


Now we consider an arbitrary subset Q’ C Q that does not contain any 
nearly-orthogonal vectors. We want to establish that Q’ must be “small.” 


Claim 1. Jf x, y are distinct vectors from Q, then +((a, y) + 2) 
is an integer in the range 


—(q-2) < F((e,y)+2) < q-1. 


Both x and y have an even number of (—1)-components, so the number of 
components in which z and y differ is even, too. Thus 


(w,y) = (4q—2) — 2#{i: a; Ay} = —2 (mod 4) 


for all x, y € Q, that is, +((a, y) + 2) is an integer. 

From x,y € {+1,—1}7-? we see that —(4q — 2) < (x,y) < 4q - 2, 
that is, —(q — 1) < 4((w, y) + 2) < qg. The lower bound never holds with 
equality, since x1; = y; = 1 implies that a # —y. The upper bound holds 
with equality only if z = y. 


Claim 2. For any y © Q’, the polynomial in n variables x1, ..., Xn 
of degree q — 2 given by 


<a 


q-2 


Fya) = P(e) +2) = ( 


satisfies that Fy(a) is divisible by p for every x € Q'\{y}, but 
not for 2 = y. 


The representation by a binomial coefficient shows that Fy, (a) is an integer- 
valued polynomial. For « = y, we get Fy(y) = 1. Fora ¥ y, the 
Lemma yields that F(a) is not divisible by p if and only if ¢((a, y) +2) is 
congruent to 0 or 1 (mod q). By Claim 1, this happens only if ¢((a, y) +2) 
is either 0 or 1, that is, if (a, y) € {—2,+2}. So a and y must be nearly- 
orthogonal for this, which contradicts the definition of Q’. 
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Claim 3. The same is true for the polynomials Fy(«) in then—1 
variables £2,...,%n that are obtained as follows: Expand Fy(x) 
into monomials and remove the variable x, and reduce all higher 
powers of other variables, by substituting x; = 1, and x? = 1 for 
i> 1. The polynomials F'y(a) have degree at most q — 2. 


The vectors x € Q C {+1,—1}” all satisfy x} = 1 and x? = 1. Thus 
the substitutions do not change the values of the polynomials on the set Q. 
They also do not increase the degree, so F'y(a) has degree at most g — 2. 


Claim 4. There is no linear relation (with rational coefficients) 
between the polynomials F'y(x), that is, the polynomials F'y(x), 
y € Q’, are linearly independent over Q. In particular, they are 
distinct. 
Assume that there is a relation of the form >7,,<¢, dy F'y(a) = 0 such that 
not all coefficients a, are zero. After multiplication with a suitable scalar 
we may assume that all the coefficients are integers, but not all of them are 
divisible by p. But then for every y € Q’ the evaluation at x := y yields 


that a, F’,(y) is divisible by p, and hence so is ay, since F’y(y) is not. 


Claim 5. |Q’| is bounded by the number of squarefree monomials 

of degree at most q — 2 inn — 1 variables, which is hae rs? 
By construction the polynomials Fy are squarefree: none of their mono- 
mials contains a variable with higher degree than 1. Thus each F',(a) is a 
linear combination of the squarefree monomials of degree at most g — 2 in 
the n — 1 variables r2,...,2Z,. Since the polynomials F,(z) are linearly 
independent, their number (which is |Q’|) cannot be larger than the number 
of monomials in question. 


Details for (2): The first column of xa is x. Thus for distinct 2 € Q 
we obtain distinct matrices M(x) := xa". We interpret these matrices as 
vectors of length n? with components x;2;. A simple computation 


>> Gizs) (yay) 


i=1 j=1 


(Scam)(Qiam) = ol? 24 


l| 


(M(x), M(y)) 


I 


shows that the scalar product of (a) and M(y) is minimized if and only 
if x,y € Q are nearly-orthogonal. 


Details for (3): Let U(x) € {+1,—1}¢ denote the vector of all sub- 
diagonal entries of M(a:). Since M(x) = xa7 is symmetric with diagonal 
values +1, we see that M(x) A M(y) implies U(x) 4 U(y). 
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Furthermore, 


that is, 
n 
(U(@),Uy)) = 242, 


with equality if and only if x and y are nearly-orthogonal. Since all the 
vectors U(a) € S' have the same length 


this means that the maximal distance between points U(x),U(y) € S is 
achieved exactly when x and y are nearly-orthogonal. 


Details for (4): For g = 9 we have g(9) ~ 758.31, which is greater than 
d+1= (%) +1=562. 

To obtain a general bound for large d, we use monotonicity and unimodality 
of the binomial coefficients and the estimates n! > e(4)" and n! < en(2)” 
(see the appendix to Chapter 2) and derive 


(9) <2) eatin <* caw 


Thus we conclude 


94q—4 e 27 q 
f(@) = gq) = = (i-2 oe) 
1=0 


From this, with 


d = (2qg—1)(4qg—3) = 5q? + (q—3)(3qg—1) > 5q? forg > 3, 


SRM cd yd d a7\V a 
q=gtvest tals, and (7) ® > 1.2032, 


we get 


f(d) > ll: .2032)¥4 > (iayve for all large enough d. 
A counterexample of dimension 560 is obtained by noting that for g = 9 the 
quotient g(q) * 758 is much larger than the dimension d(q) = 561. Thus 
one gets a counterexample for d = 560 by taking only the “three fourths” 
of the points in S' that satisfy x21 + 731 + 732 = —1. 
Borsuk’s conjecture is known to be true for d < 3, but it has not been 
verified for any larger dimension. In contrast to this, it is true up to d = 8 
if we restrict ourselves to subsets S C {1,—1}4, as constructed above 
(see [9]). In either case it is quite possible that counterexamples can be 
found in quite small dimensions. 
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“Hilbert’s seaside resort hotel” 


Sets, functions, 
and the continuum hypothesis 


Set theory, founded by Georg Cantor in the second half of the 19th cen- 
tury, has profoundly transformed mathematics. Modern day mathematics 
is unthinkable without the concept of a set, or as David Hilbert put it: 
“Nobody will drive us from the paradise (of set theory) that Cantor has 
created for us.” 

One of Cantor’s basic concepts is the notion of the size or cardinality of a 
set M, denoted by |/]. For finite sets, this presents no difficulties: we just 
count the number of elements and say that // is an n-set or has size n, if 
M contains precisely n elements. Thus two finite sets 17 and N have equal 
size, || = |.N|, if they contain the same number of elements. 

To carry this notion of equal size over to infinite sets, we use the following 
suggestive thought experiment for finite sets. Suppose a number of people 
board a bus. When will we say that the number of people is the same as the 
number of available seats? Simple enough, we let all people sit down. If 
everyone finds a seat, and no seat remains empty, then and only then do the 
two sets (of the people and of the seats) agree in number. In other words, 
the two sizes are the same if there is a bijection of one set onto the other. 


This is then our definition: Two arbitrary sets / and WN (finite or infinite) 
are said to be of equal size or cardinality, if and only if there exists a bi- 
jection from M onto NV. Clearly, this notion of equal size is an equivalence 
relation, and we can thus associate a number, called cardinal number, to 
every class of equal-sized sets. For example, we obtain for finite sets the 
cardinal numbers 0,1,2,...,,... where n stands for the class of n-sets, 
and, in particular, 0 for the empty set ©. We further observe the obvious fact 
that a proper subset of a finite set / invariably has smaller size than M. 
The theory becomes very interesting (and highly non-intuitive) when we 
turn to infinite sets. Consider the set N = {1,2,3,...} of natural numbers. 
We call a set M/ countable if it can be put in one-to-one correspondence 
with N. In other words, M is countable if we can list the elements of M as 
M1,™M2,™m3,.... But now a strange phenomenon occurs. Suppose we add 
to N anew element x. Then NU {2} is still countable, and hence has equal 
size with N! 

This fact is delightfully illustrated by “Hilbert’s hotel.’ Suppose a hotel 
has countably many rooms, numbered 1, 2,3,... with guest g; occupying 
room 7; so the hotel is fully booked. Now a new guest « arrives asking 
for a room, whereupon the hotel manager tells him: Sorry, all rooms are 
taken. No problem, says the new arrival, just move guest g; to room 2, 
gz to room 3, gz to room 4, and so on, and I will then take room 1. To the 
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manager’s surprise (he is not a mathematician) this works; he can still put 
up all guests plus the new arrival x! 


Now it is clear that he can also put up another guest y, and another one z, 
and so on. In particular, we note that, in contrast to finite sets, it may well 
happen that a proper subset of an infinite set M has the same size as M. In 
fact, as we will see, this is a characterization of infinity: A set is infinite if 
and only if it has the same size as some proper subset. 

Let us leave Hilbert’s hotel and look at our familiar number sets. The set 
Z of integers is again countable, since we may enumerate Z in the form 
Z = {0,1,—1, 2, —2,3, —3,...}. It may come more as a surprise that the 
rationals can be enumerated in a similar way. 


Theorem 1. The set Q of rational numbers is countable. 


H Proof. By listing the set Qt of positive rationals as suggested in the 
figure in the margin, but leaving out numbers already encountered, we see 
that Q* is countable, and hence so is Q by listing 0 at the beginning and 
= right after . With this listing 


_ 1 11 _1 3 _3 
Q = {0,1,-1,2,—-2,5 4 Oye os par ee te 
Another way to interpret the figure is the following statement: 


The union of countably many countable sets M,, is again countable. 


Indeed, set M,, = {@n1, @n2, An3,-.-} and list 
loc) 
U Mn = {411, 421, 412, 413, 422, 431, 441, 432, 423, 414,-.. } 
n=1 


precisely as before. 


Let us contemplate Cantor’s enumeration of the positive rationals a bit 
more. Looking at the figure we obtained the sequence 


eae a ee ee rs 
1? 1? 2? 3° 2? 1? 1? 2? 3? 4? 5? 4? 3? 2° 7? 
and then had to strike out the duplicates such as 2 = + or 2 — 5. 


But there is a listing that is even more elegant and systematic, and which 
contains no duplicates — found only quite recently by Neil Calkin and 
Herbert Wilf. Their new list starts as follows: 


3 3 4 25 3 4 


2 1 2 al 3.5 
£2.37 09.87 1? ae 38> 52 22 BF Bt a? YI tee 


i 
? D7 
Here the denominator of the n-th rational number equals the numerator of 


the (n + 1)-st number. In other words, the n-th fraction is b(n) /b(n + 1), 


where (b(n)) so is a sequence that starts with 


(1, 1, 2, 1, 3, 2, 3, 1, 4, 3, 5, 2, 5, 3,4, 1,5, ...). 
This sequence has first been studied by a German mathematician, Moritz 


Abraham Stern, in a paper from 1858, and is has become known as “‘Stern’s 
diatomic series.” 
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How do we obtain this sequence, and hence the Calkin—Wilf listing of the 
positive fractions? Consider the infinite binary tree in the margin. We 
immediately note its recursive rule: 


e + is on top of the tree, and 


it+j 
We can easily check the following four properties: 


(1) All fractions in the tree are reduced, that is, if - appears in the tree, 
then r and s are relatively prime. 


This holds for the top +, and then we use induction downward. If r and s 
are relatively prime, then so are r andr + s, as wellas sandr +s. 


(2) Every reduced fraction + > 0 appears in the tree. 


We use induction on the sum r + s. The smallest value is r + s = 2, that 
ist = +, and this appears at the top. If r > s, then “+ appears in the tree 
by induction, and so we get = as its right son. Similarly, if r < s, then 
appears, which has * as its left son. 


(3) Every reduced fraction appears exactly once. 


The argument is similar. If = appears more than once, then 7 s, since 
s 


any node in the tree except the top is of the form a < lor = > 1. But 
ifr > sorr < s, then we argue by induction as before. 


Every positive rational appears therefore exactly once in our tree, and we 
may write them down listing the numbers level-by-level from left to right. 
This yields precisely the initial segment shown above. 


(4) The denominator of the n-th fraction in our list equals the numerator 
of the (n + 1)-st. 


This is certainly true for m = 0, or when the n-th fraction is a left son. 
Suppose the n-th number = is a right son. If < is at the right boundary, 
then s = 1, and the successor lies at the left boundary and has numerator 1. 
Finally, if 7 is in the interior, and c is the next fraction in our sequence, 


rs 


3 
r 
Ss si 


‘ is the left son of ——, and by induction 


—rl? 


then © is the right son of 


. af 
is the numerator of =, so we get s =r’. 


s/—r’? 


r—s 
s 


the denominator of 


Well, this is nice, but there is even more to come. There are two natural 
questions: 


— Does the sequence (b(n)) so have a “meaning”? That is, does b(n) 
count anything simple? ~ 


— Given <, is there an easy way to determine the successor in the listing? 


fs 


2 1 
e every node : has two sons: the left son is —*. and the right son is _, yp \ y \ 
1 3 2 3 


hy 


RA A 


AR 
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For example, h(6) = 3, with the hyper- 


binary representations 


6=442 
6=44+14 
6=24+24 


a 


RIO 


t 1 
Follies 


- = 


/™. 


Array 


To answer the first question, we work out that the node b(n) /b(m + 1) has 
the two sons b(2n + 1)/b(2n + 2) and b(2n + 2)/b(2n +3). By the set-up 
of the tree we obtain the recursions 

b(2n+ 1) =06(n) and b(2n+4+ 2) = b(n) + b(n +1). (1) 
With 6(0) = 1 the sequence (b(7)),>0 is completely determined by (1). 
So, is there a “nice” “known” sequence which obeys the same recursion? 
Yes, there is. We know that any number n can be uniquely written as a sum 
of distinct powers of 2 — this is the usual binary representation of n. A 
hyper-binary representation of n is a representation of n a sum of powers 
of 2, where every power 2" appears at most twice. Let h(n) be the number 
of such representations for n. You are invited to check that the sequence 
h(n) obeys the recursion (1), and this gives b(n) = h(n) for all n. 
Incidentally, we have proved a surprising fact: Let * be a reduced fraction, 
there exists precisely one integer n with r = h(n) and s = h(n +1). 
Let us look at the second question. We have in our tree 


r 


s 
ra \ that is, with x := = va ‘ 
r r+s 
Te 


z+l1 


r+s s 


We now use this to generate an even larger infinite binary tree (without a 
root) as follows: 


anes ~— 

se . £ Ne 
LN, XN 
YX RA AR 
AO AAR AAA 


oA PB MMA 


Aavseedtdivas 


In this tree all rows are equal, and they all display the Calkin—Wilf listing 
of the positive rationals (starting with an additional 2), 
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So how does one get from one rational to the next? To answer this, we first 
record that for every rational x its right son is x + 1, the right grand-son is 
x + 2, so the k-fold Enept son is x + k. Similarly, the left son a x is 
whose left son is ory and so on: The k-fold left son of x is rare 

Now to find how to get from = = «& to the “next” rational f(a) in the 
listing, we have to analyze the situation depicted in the margin. In fact, if 
we consider any nonnegative rational number « in our infinite binary tree, 
then it is the k-fold right son of the left son of some rational y > 0 (for 
some k > 0), while f(x) is given as the k-fold left son of the right son of 
the same y. Thus with the formulas for k-fold left sons and k-fold right 


sons, we get 


rae 


i +k, 
a 


as claimed “ the figure in the margin. Here k = |x| is the integral part 


of x, while =e = {x} is the fractional part. And from this we obtain 


y+1 = 1 1 1 


I+ky+1) +k k+1-Sh [s+ 1-{o} 


f(x) = 


Thus we have obtained a beautiful formula for the successor f() of «, first 
found by Moshe Newman: 


The function 
=I ja) 


generates the Calkin—Wilf sequence 


Loox dex 2 
Ti ear edie ial 


' pos, pe Bee Be ee ES gs 
Ime a tap by tae & tae ap tae a tae m tee 


which contains every positive rational number exactly once. 


The Calkin—Wilf-—Newman way to enumerate the positive rationals has a 
number of additional remarkable properties. For example, one may ask for 
a fast way to determine the n-th fraction in the sequence, say for n = 10°. 
Here it is: 


To find the n-th fraction in the Calkin—Wilf sequence, express n as a 
binary number n = (b,b,—1..-b1b9)2, and then follow the path in the 
Calkin—Wilf tree that is determined by its digits, starting at | = o. 

Here b; = 1 means “take the right son,” that is, “add the denominator 
to the numerator,’ while b; = 0 means “take the left son,’ that is, “add 


the numerator to the denominator.” 


The figure in the margin shows the resulting path for n = 25 = (11001)o: 
So the 25th number in the Calkin—Wilf sequence is f. The reader could 
easily work out a similar scheme that computes for a given fraction | (the 
binary representation of) its position n in the Calkin—Wilf sequence. 


Lk y+1 
1+k(y+1) 


RIO 


™, 
eX YN 
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A bijective f : (0, 1] —> (0,1) 


Let us move on to the real numbers R. Are they still countable? No, they 
are not, and the means by which this is shown — Cantor’s diagonalization 
method — is not only of fundamental importance for all of set theory, but 
certainly belongs into The Book as a rare stroke of genius. 


Theorem 2. The set R of real numbers is not countable. 


@ Proof. Any subset N of a countable set M = {m 1, mz2,m3,...} is at 
most countable (that is, finite or countable). In fact, just list the elements 
of N as they appear in /. Accordingly, if we can find a subset of R which 
is not countable, then a fortiori IR cannot be countable. The subset 
of R we want to look at is the interval (0, 1] of all positive real numbers r 
with 0 < r < 1. Suppose, to the contrary, that IZ is countable, and let 
M = {r1,712,73,...} bea listing of M7. We write r,, as its unique infinite 
decimal expansion without an infinite sequence of zeros at the end: 


lm = 0.4n1An2An3--- 


where a,; € {0,1,...,9} for all n and i. For example, 0.7 = 0.6999... 
Consider now the doubly infinite array 


TL = 0.a11€12043... 
Tr = 0.491A92093... 
Tr = 0.dn1An24n3--- 


For every n, let b,, be the least element of {1,2} that is different from ay. 
Then b = 0.6, b2b3...6,... is a real number in our set M and hence must 
have an index, say b = r;. But this cannot be, since by is different from az,. 
And this is the whole proof! 


Let us stay with the real numbers for a moment. We note that all four 
types of intervals (0,1), (0, 1], [0, 1) and [0,1] have the same size. As an 
example, we verify that (0, 1] and (0,1) have equal cardinality. The map 
f : (0,1) —> (0,1), a +> y defined by 


3 1 
5—-@ for 5<2<l, 

3 1 1 

5-@ for 7<@<5, 
= 3 1 1 
e-@ for g< 253; 


does the job. Indeed, the map is bijective, since the range of y in the first line 
is 4 < y < 1, in the second line ; <y< $s in the third line 4 <y< i 
and so on. 
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Next we find that any two intervals (of finite length > 0) have equal size 
by considering the central projection as in the figure. Even more is true: 
Every interval (of length > 0) has the same size as the whole real line R. 
To see this, look at the bent open interval (0, 1) and project it onto R from 
the center S’. 

So, in conclusion, any open, half-open, closed (finite or infinite) interval of 
length > 0 has the same size, and we denote this size by c, where c stands 
for continuum (a name sometimes used for the interval [0,1]). 


That finite and infinite intervals have the same size may come expected on 
second thought, but here is a fact that is downright counter-intuitive. 


Theorem 3. The set R? of all ordered pairs of real numbers (that is, the 
real plane) has the same size as R. 


The theorem is due to Cantor 1878, as is the idea to merge the decimal 
expansions of two reals into one. The variant of Cantor’s method that we 
are going to present is again from The Book. Abraham Fraenkel attributes 
the trick, which directly yields a bijection, to Julius K6nig. 


H Proof. It suffices to prove that the set of all pairs (x, y),0 < z,y < 1, 
can be mapped bijectively onto (0, 1]. Consider the pair (x,y) and write 
x,y in their unique non-terminating decimal expansion as in the following 
example: 


x = 0.3 Ol 2 007 08 
y = 0.009 2 05 1 0008 


Note that we have separated the digits of x and y into groups by always 
going to the next nonzero digit, inclusive. Now we associate to (a, y) the 
number z € (0,1] by writing down the first x-group, after that the first 
y-group, then the second x-group, and so on. Thus, in our example, we 
obtain 


z = 0.3 009 01 2 205 007 1 08 0008 ... 


Since neither x nor y exhibits only zeros from a certain point on, we find 
that the expression for z is again a non-terminating decimal expansion. 
Conversely, from the expansion of z we can immediately read off the 
preimage (a, y), and the map is bijective — end of proof. 


As (a, y) +> x + iy is a bijection from R? onto the complex numbers C, 
we conclude that |C| = |IR| = c. Why is the result |R?| = |R| so unex- 
pected? Because it goes against our intuition of dimension. It says that the 
2-dimensional plane R? (and, in general, by induction, the n-dimensional 
space IR”) can be mapped bijectively onto the 1-dimensional line R. Thus 
dimension is not generally preserved by bijective maps. If, however, we 
require the map and its inverse to be continuous, then the dimension is pre- 
served, as was first shown by Luitzen Brouwer. 
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“Cantor and Bernstein painting” 


Let us go a little further. So far, we have the notion of equal size. When 
will we say that M is at most as large as N? Mappings provide again the 
key. We say that the cardinal number m is less than or equal to n, if for 
sets M and N with |M| = m, |N| =n, there exists an injection from M 
into N. Clearly, the relation m < n is independent of the representative 
sets M/ and N chosen. For finite sets this corresponds again to our intuitive 
notion: An m-set is at most as large as an n-set if and only if m <n. 

Now we are faced with a basic problem. We would certainly like to have 
that the usual laws concerning inequalities also hold for cardinal numbers. 
But is this true for infinite cardinals? In particular, is it true that m < n, 
n < mimply m =n? 

The affirmative answer to this question is provided by the famous Cantor— 
Bernstein theorem, which Cantor announced in 1883. The first complete 
proof was presented by Felix Bernstein in Cantor’s seminar in 1897. Fur- 
ther proofs were given by Richard Dedekind, Ernst Zermelo, and others. 
Our proof is due to Julius K6nig (1906). 


Theorem 4. [f each of two sets M and N can be mapped injectively into 
the other, then there is a bijection from M to N, that is, |M| = |N\. 


@ Proof. We may certainly assume that M and WN are disjoint — if not, 
then we just replace N by a new copy. 

Now f and g map back and forth between the elements of M and those 
of N. One way to bring this potentially confusing situation into perfect 
clarity and order is to align M/Z U N into chains of elements: Take an arbi- 
trary element mo € M, say, and from this generate a chain of elements by 
applying f, then g, then f again, then g, and so on. The chain may close up 
(this is Case 1) if we reach mo again in this process, or it may continue with 
distinct elements indefinitely. (The first “duplicate” in the chain cannot be 
an element different from mo, by injectivity.) 
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If the chain continues indefinitely, then we try to follow it backwards: 
From mo to g~ (mo) if mo is in the image of g, then to f~'(g~1(mo)) 
if g~ (mo) is in the image of f, and so on. Three more cases may arise 
here: The process of following the chain backwards may go on indefinitely 
(Case 2), it may stop in an element of I that does not lie in the image of g 
(Case 3), or it may stop in an element of N that does not lie in the image 
of f (Case 4). 

Thus M U WN splits perfectly into four types of chains, whose elements 
we may label in such a way that a bijection is simply given by putting 
F’: mj; n,;. We verify this in the four cases separately: 


Case 1. Finite cycles on 2k + 2 distinct elements (k > 0) 


Fi g f f 


M9 —— 29 1s Mk —S— Nk 


be g 
Case 2. Two-way infinite chains of distinct elements 


i g i g i 


te 3 M9 > N90 SM), SS 1 SM) —— 


Case 3. The one-way infinite chains of distinct elements that start at the 
elements mo € M\g(N) 


7 g f g f 


mo Ss 2nQ SS ™/M)1,—CO SS — 11h MN: 

Case 4. The one-way infinite chains of distinct elements that start at the 
elements no € N\ f(M) 

g f g cs 


no SS N09 hho — ™|'?1,sO —— 


What about the other relations governing inequalities? As usual, we set 
m<nifm <n, but m #n. We have just seen that for any two cardinals 
m and n at most one of the three possibilities 


m<n,m=n,m>n 


holds, and it follows from the theory of cardinal numbers that, in fact, pre- 
cisely one relation is true. (See the appendix to this chapter, Proposition 2.) 
Furthermore, the Cantor—-Bernstein Theorem tells us that the relation < is 
transitive, that is, m < nandn < p imply m < p. Thus the cardinalities 
are arranged in linear order starting with the finite cardinals 0, 1,2,3,.... 
Invoking the usual Zermelo—Fraenkel axiom system, we easily find that any 
infinite set 17 contains a countable subset. In fact, (/ contains an element, 
say m . The set M \ {m,} is not empty (since it is infinite) and hence 
contains an element m2. Considering M \ {m1, m2} we infer the existence 
of m3, and so on. So, the size of a countable set is the smallest infinite 
cardinal, usually denoted by No (pronounced “aleph zero’’). 


“The smallest infinite cardinal” 
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As a corollary to No < m for any infinite cardinal m, we can immediately 
prove “Hilbert’s hotel” for any infinite cardinal number m, that is, we have 
|M U {x}| = |M| for any infinite set 17. Indeed, M contains a subset 
= {m ,,m2,m3,...}. Now map x onto m1, m1 onto mz, and so on, 
keeping the elements of M/\N fixed. This gives the desired bijection. 
With this we have also proved a result announced earlier: Every infinite set 
has the same size as some proper subset. 
As another consequence of the Cantor—Bernstein theorem we may prove 
that the set P(N) of all subsets of N has cardinality c. As noted above, it 
suffices to show that |P(N)\{@}| = |(0,1]|. An example of an injective 
map is 


f: PIN) \ {2} — (©,4), 
A r> > 10-*, 
ic A 


while 
0.6, bobs... -—> {b;10' :4E N} 


defines an injection in the other direction. 


Up to now we know the cardinal numbers 0,1, 2,...,%o, and further that 
the cardinality c of R is bigger than No. The passage from Q with |Q| = No 
to R with |R| = c immediately suggests the next question: 


Ts c = |R| the next infinite cardinal number after Xo? 


Now, of course, we have the problem whether there is a next larger cardinal 
number, or in other words, whether 8; has a meaning at all. It does — the 
proof for this is outlined in the appendix to this chapter. 


The statement c = N, became known as the continuum hypothesis. The 
question whether the continuum hypothesis is true presented for many 
decades one of the supreme challenges in all of mathematics. The answer, 
finally given by Kurt Godel and Paul Cohen, takes us to the limit of 
logical thought. They showed that the statement c = X, is independent 
of the Zermelo—Fraenkel axiom system, in the same way as the parallel 
axiom is independent of the other axioms of Euclidian geometry. There are 
models where c = §, holds, and there are other models of set theory where 
c #X, holds. 

In the light of this fact it is quite interesting to ask whether there are other 
conditions (from analysis, say) which are equivalent to the continuum 
hypothesis. Indeed, it is natural to ask for an analysis example, since his- 
torically the first substantial applications of Cantor’s set theory occurred in 
analysis, specifically in complex function theory. In the following we want 
to present one such instance and its extremely elegant and simple solution 
by Paul Erdos. In 1962 John E. Wetzel, a young instructor at the University 
of Illinois, asked the following question: 
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Let { fa} be a family of pairwise distinct analytic functions on the 
complex numbers such that for each z € C the set of values { f(z) } 
is at most countable (that is, it is either finite or countable); let us 
call this property (Po). 

Does it then follow that the family itself is at most countable? 


Very shortly afterwards Erdés showed that, surprisingly, the answer de- 
pends on the continuum hypothesis. 


Theorem 5. [fc > Nj, then every family { f..} satisfying (Po) is countable. 
Tf, on the other hand, c = 4, then there exists some family {f..} with 
property (Po) which has size c. 


For the proof we need some basic facts on cardinal and ordinal numbers. 
For readers who are unfamiliar with these concepts, this chapter has an 
appendix where all the necessary results are collected. 


M@ Proof. Assume first c > %1. We shall show that for any family {f.} 
of size 8, of analytic functions there exists a complex number zo such that 
all ®; values f.(zo) are distinct. Consequently, if a family of functions 
satisfies (Po), then it must be countable. 

To see this, we make use of our knowledge of ordinal numbers. First, we 
well-order the family { f,} according to the initial ordinal number w of Ny. 
This means by Proposition | of the appendix that the index set runs through 
all ordinal numbers a@ which are smaller than w,;. Next we show that the 
set of pairs (a, 3), a < B < wy, has size %y. Since any B < uw isa 
countable ordinal, the set of pairs (a, 3), a < 6, is countable for every 
fixed 8. Taking the union over all 8i-many £3, we find from Proposition 6 
of the appendix that the set of all pairs (a, 3), a < 6, has size Ny. 


Consider now for any pair a < £ the set 


S(a,B) = {z€C: falz) = fa(z)}- 


We claim that each set S(a, 3) is countable. To verify this, consider the 
disks Cy, of radius k = 1, 2,3,... around the origin in the complex plane. 
If fq and fg agree on infinitely many points in some Cz, then f, and fg 
are identical by a well-known result on analytic functions. Hence f, and 
fg agree only in finitely many points in each C;, and hence in at most 
countably many points altogether. Now we set 


S = |) sp): 


a<B 


Again by Proposition 6, we find that S has size 1, as each set S(a, 3) is 
countable. And here is the punch line: Because, as we know, C has size 
c, and c is larger than X, by assumption, there exists a complex number zo 
not in S, and for this zo all 8; values f,(zo) are distinct. 
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Next we assume c = XN. Consider the set D C C of complex numbers 
p + iq with rational real and imaginary part. Since for each p the set 
{p +iq:q € Q} is countable, we find that D is countable. Furthermore, 
D isa dense set in C: Every open disk in the complex plane contains some 
point of D. Let {z, : 0 < a@ < wy} be a well-ordering of C. We shall 
now construct a family {fg : 0 < 6 < w 1} of Ni-many distinct analytic 
functions such that 


fa(Za) € D whenever a < 8. (1) 


Any such family satisfies the condition (Po). Indeed, each point z € C has 
some index, say z = Z. Now, for all 6 > a, the values { f(z.) } lie in 
the countable set D. Since a is a countable ordinal number, the functions 
fg with 8 < a will contribute at most countably further values fg(zq), so 
that the set of all values { f3(za)} is likewise at most countable. Hence, if 
we can construct a family { f} satisfying (1), then the second part of the 
theorem is proved. 

The construction of { f} is by transfinite induction. For fy we may take 
any analytic function, for example fp = constant. Suppose fg has already 
been constructed for all 6 < y. Since ¥ is a countable ordinal, we may 
reorder {fg : 0 < 8 < ¥} into a sequence 91, 92, 93,.... The same re- 
ordering of {z, :0 < a < 4} yields a sequence w1, w2, w3,.... We shall 
now construct a function f,, satisfying for each n the conditions 


fy (Wn) €D and fy(wn) F Jn(Wn). (2) 


The second condition will ensure that all functions f, (0 < y < w) are 
distinct, and the first condition is just (1), implying (Po) by our previous 
argument. Notice that the condition f,(wn) A gn(wn) is once more a 
diagonalization argument. 


To construct f.,, we write 


fy(z) = eo +e1(2— wr) + €2(z — w1)(z — we) 


} €3(z w1)(z we) (z w3) ++: 


If y is a finite ordinal, then f, is a polynomial and hence analytic, and we 
can certainly choose numbers ¢; such that (2) is satisfied. Now suppose 7 
is a countable ordinal, then 


f(z) = Sl en(z—wi)-++(z— wp). (3) 


n=0 


Note that the values of €,, (m > n) have no influence on the value f,(wn), 
hence we may choose the €,, step by step. If the sequence (¢,,) converges 
to 0 sufficiently fast, then (3) defines an analytic function. Finally, since 
D is a dense set, we may choose this sequence (€,,) so that f, meets the 
requirements of (2), and the proof is complete. 
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Appendix: On cardinal and ordinal numbers 


Let us first discuss the question whether to each cardinal number there ex- 
ists a next larger one. As a start we show that to every cardinal number m 
there always is a cardinal number n larger than m. To do this we employ 
again a version of Cantor’s diagonalization method. 

Let M be a set, then we claim that the set P(M) of all subsets of M has 
larger size than M. By letting m € M correspond to {m} € P(M), 
we see that M7 can be mapped bijectively onto a subset of P(/), which 
implies |M| < |P(M)| by definition. It remains to show that P(M) can 
not be mapped bijectively onto a subset of 17. Suppose, on the contrary, 
yp: N —>+ P(M) is a bijection of N C M onto P(M). Consider the 
subset U C N of all elements of N which are not contained in their image 
under y, that is, U = {m € N: m ¢ —y(m)}. Since y is a bijection, there 
exists u € N with y(u) = U. Now, either u € U or u ¢ U, but both 
alternatives are impossible! Indeed, if u € U, then u ¢ y(u) = U by the 
definition of U, and if u ¢ U = y(u), then u € U, contradiction. 

Most likely, the reader has seen this argument before. It is the old barber 
riddle: “A barber is the man who shaves all men who do not shave them- 
selves. Does the barber shave himself?” 


To get further in the theory we introduce another great concept of Cantor’s, 
ordered sets and ordinal numbers. A set is ordered by < if the relation 
< is transitive, and if for any two distinct elements a and b of M we either 
have a < borb < a. For example, we can order N in the usual way accord- 
ing to magnitude, N = {1,2,3,4,...}, but, of course, we can also order N 
the other way round, N = {...,4,3,2,1}, orN = {1,3,5,...,2,4,6,...} 
by listing first the odd numbers and then the even numbers. 

Here is the seminal concept. An ordered set M is called well-ordered if 
every nonempty subset of MV has a first element. Thus the first and third 
orderings of N above are well-orderings, but not the second ordering. The 
fundamental well-ordering theorem, implied by the axioms (including the 
axiom of choice), now states that every set / admits a well-ordering. From 
now on, we only consider sets endowed with a well-ordering. 

Let us say that two well-ordered sets M/ and WN are similar (or of the same 
order-type) if there exists a bijection y from M on N which respects the 
ordering, that is, m <,, n implies y(m) <,, y(n). Note that any ordered 
set which is similar to a well-ordered set is itself well-ordered. 

Similarity is obviously an equivalence relation, and we can thus speak of 
an ordinal number a belonging to a class of similar sets. For finite sets, 
any two orderings are similar well-orderings, and we use again the ordinal 
number 7 for the class of n-sets. Note that, by definition, two similar sets 
have the same cardinality. Hence it makes sense to speak of the cardinality 
|a| of an ordinal number a. Note further that any subset of a well-ordered 
set is also well-ordered under the induced ordering. 

As we did for cardinal numbers, we now compare ordinal numbers. Let 1 
be a well-ordered set, m € M, then M,, = {a € M: ax < m} is called the 
(initial) segment of M determined by m; N is asegment of M if N = M,, 


“A legend talks about St. Augustin who, 
walking along the seashore and contem- 
plating infinity, saw a child trying to 
empty the ocean with a small shell...” 


The well-ordered sets N = {1,2,3,..} 
and N = {1,3,5,...,2,4,6,...} are 
not similar: the first ordering has only 
one element without an immediate pre- 
decessor, while the second one has two. 
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The ordinal number of {1,2,3,...} 
is smaller than the ordinal number of 
{1,355,322 5 25456) 30h: 


for some m. Thus, in particular, //,,, is the empty set when m is the first 
element of 1/7. Now let z and v be the ordinal numbers of the well-ordered 
sets M and N. We say that yw is smaller than v, uw < v, if M is similar 
to a segment of N. Again, we have the transitive law that up < v,y <7 
implies ju < 7, since under a similarity mapping a segment is mapped onto 
a segment. 

Clearly, for finite sets, m < mn corresponds to the usual meaning. Let 
us denote by w the ordinal number of N = {1,2,3,4,...} ordered ac- 
cording to magnitude. By considering the segment N,,+1 we find n < w 
for any finite n. Next we see that w < a holds for any infinite ordinal 
number a. Indeed, if the infinite well-ordered set J/ has ordinal num- 
ber a, then M contains a first element mj, the set M/\{m ,} contains a 
first element m2, M\{m1, m2} contains a first element m3. Continuing 
in this way, we produce the sequence m, < mz < m3 < --: in M. If 
M = {m1,m2,msz,...}, then M is similar to N, and hence a = w. If, 
on the other hand, M\{m1, mo2,...} is nonempty, then it contains a first 
element m, and we conclude that N is similar to the segment //,,, that is, 
w < a by definition. 

We now state (without the proofs, which are not difficult) three basic re- 
sults on ordinal numbers. The first says that any ordinal number ju has a 
“standard” representative well-ordered set W,,. 


Proposition 1. Let j be an ordinal number and denote by W,, the set of 

ordinal numbers smaller than 1. Then the following holds: 

(i) The elements of W,, are pairwise comparable. 

(ii) If we order W,, according to magnitude, then W,, is well-ordered and 
has ordinal number 1. 


Proposition 2. Any two ordinal numbers 1 and v satisfy precisely one of 
the relations bp <v, “= V, or pb > Vv. 


Proposition 3. Every set of ordinal numbers (ordered according to 
magnitude) is well-ordered. 


After this excursion to ordinal numbers we come back to cardinal num- 
bers. Let m be a cardinal number, and denote by Om the set of all ordinal 
numbers ys with |u| = m. By Proposition 3 there is a smallest ordinal 
number Wy in Om, which we call the initial ordinal number of m. As an 
example, w is the initial ordinal number of No. 


With these preparations we can now prove a basic result for this chapter. 


Proposition 4. For every cardinal number m there is a definite next larger 
cardinal number. 


@ Proof. We already know that there is some larger cardinal number n. 
Consider now the set K of all cardinal numbers larger than m and at most 
as large as n. We associate to each p € Kits initial ordinal number wy. 
Among these initial numbers there is a smallest (Proposition 3), and the 
corresponding cardinal number is then the smallest in , and thus is the 
desired next larger cardinal number to m. 
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Proposition 5. Let the infinite set M have cardinality m, and let M be 
well-ordered according to the initial ordinal number Wy. Then M has no 
last element. 


@ Proof. Indeed, if 1/7 had a last element m, then the segment 1/,,, would 
have an ordinal number ps < wm with |u| = m, contradicting the definition 
of Wm- 


What we finally need is a considerable strenghthening of the result that the 
union of countably many countable sets is again countable. In the following 
result we consider arbitrary families of countable sets. 


Proposition 6. Suppose {Aq} is a family of size m of countable sets Aq, 
where m is an infinite cardinal. Then the union |) Aq has size at most m. 
a 

@ Proof. We may assume that the sets Aq are pairwise disjoint, since this 
can only increase the size of the union. Let M with || = m be the index 
set, and well-order it according to the initial ordinal number wy. We now 
replace each a € M by a countable set By = {bai = , baa, baz,---}, 
ordered according to w, and call the new set M. Then M is again well- 
ordered by setting by; < bg; fora < 8 and by; < ba; fori < j. Let jz be 


the ordinal number of 1/7. Since M is a subset of /, we have yw < yi by an 
earlier argument. If 42 = j2, then M is similar to M. , and if pp < 4, then 
is similar to a segment of M. Now, since the ordering Wy, of M has no last 
element (Proposition 5), we see that V/ is in both cases similar to the union 
of countable sets Bg, and hence of the same cardinality. 

The rest is easy. Let p : J Bg —> M be a bijection, and suppose that 
y(Bs) = {a1,42,a3,...}. Replace each a; by Aq, and consider the 
union (J Ag,. Since LJ Aa, is the union of countably many countable sets 
(and hence countable), we see that Bg has the same size as LJ) Ag,. In 
other words, there is a bijection from Bg to \) Aq, for all 6, and hence 
a bijection 7 from |) Bg to J Aa. But now wp" gives the desired 
bijection from M to LJ Aa, and thus | Ag| = m. 
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In praise of inequalities 


Analysis abounds with inequalities, as witnessed for example by the famous 
book “Inequalities” by Hardy, Littlewood and Pélya. Let us single out two 
of the most basic inequalities with two applications each, and let us listen 
in to George Polya, who was himself a champion of the Book Proof, about 
what he considers the most appropriate proofs. 

Our first inequality is variously attributed to Cauchy, Schwarz and/or to 
Buniakowski: 


Theorem I (Cauchy—Schwarz inequality) 
Let (a,b) be an inner product on a real vector space V (with the norm 
|al? := (a,a)). Then 

(a,b)? < jal? |b/? 


holds for all vectors a,b € V, with equality if and only if a and b are 
linearly dependent. 


@ Proof. The following (folklore) proof is probably the shortest. Consider 
the quadratic function 


|ra+b|? = x?\a\? + 22/a, b) + |b? 


in the variable x. We may assume a # O. If 6 = 4a, then clearly 
(a, b)? = |a|?\b|?. If, on the other hand, a and b are linearly independent, 
then |za + b|? > 0 for all x, and thus the discriminant (a, b)? — |a|?|b|? is 
less than 0. 


Our second example is the inequality of the harmonic, geometric and 
arithmetic mean: 


Theorem II (Harmonic, geometric and arithmetic mean) 
Let a1,..., Gn be positive real numbers, then 


a" | € Hayag-=scy, < 2 a 
rea feee fp a n 
with equality in both cases if and only if all a;’s are equal. 
@ Proof. The following beautiful nonstandard induction proof is attributed 


to Cauchy (see [8]). Let P(n) be the statement of the second inequality, 
written in the form 


a) 


@1a2°**An S ( 
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Note that we have proved the inequality 
x > 1+ log« for x > 0 on the side. 


For n = 2, we have ayaz < (“4%)? <=> (a; — az)? > 0, which is true. 
Now we proceed in the following two steps: 

(A) P(n) = P(n-1) 

(B) P(n) and P(2) => P(2n) 


which will clearly imply the full result. 


n-1 
To prove (A), set A := S> ak, then 
k=1 


n—-1 
cs Pn) ( Yap + A\" _ f(n-1)A+A\" _ gy 
(TIa)a*s) (ei *“) - (@2meesy 
n—1 nel 
n—1 Ok 
and hence Ll < Av™ t= (i : 
k=1 n—-1 
For (B), we see 
2n 2n a n wri 2n 4 
I= (Il%)( Tl)“ (28)"(% %) 
k=1 k=1 k=n+1 = k=n+1 


nr 


k= 
22 ap \ an 2n 2n 
P(2) — ak 
‘ (é : 7 (& 
2 2n 


The condition for equality is derived just as easily. 


The left-hand inequality, between the harmonic and the geometric mean, 
follows now by considering a cae 


an 


@ Another Proof. Of the many other proofs of the arithmetic-geometric 
mean inequality (the monograph [2] lists more than fifty), let us single out 
a particularly striking one by Horst Alzer, with some shortenings due to 
France Dacar. As a matter of fact, this proof yields the stronger inequality 
oa eee abn < pia, + podg +++: + Pndn 


for any positive numbers @1,...,@n,P1,---,Pn With )>/_, p; = 1. Let us 
denote the expression on the left side by G,, and on the right side by A. Fix 
c > 0 and define the function f(t) := + — ¢ on Ryo. Since f(t) < 0 for 


t < cand f(t) > 0 fort > c, we get the inequality 


[soa =A 


for every x > 0, with equality if and only if 7 = c. 
Now 


x t e 
o< | f(jdt= [= — loge] == -1-log=, 
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and setting c = G and x = a; we conclude that 


a7! > loga;—logG for i=1,2,...,n. (1) 


Multiplying this inequality by p; and summing over all 7 gives 
n ai n n n 
wag —Sipi => Yo piloga;s— Sp; logG. 
i=l i=1 i=l i=1 
With )>"_, pi = 1, the left side equals 4 — 1, while the right side is 
log ( [J a?) —logG = logG—logG = 0. 
i=l 


We conclude 4 —1 > 0, which is A > G. In the case of equality, all 


inequalities in (1) must be equalities, which implies aj = --- = a, = G. 


H Still another Proof. There is another nice proof, due to Michael D. 
Hirschhorn. It uses Bernoulli’s inequality, which says 


(i+é#)"*' > 14+(n4+1t for real t > —1. 


Suppose a1, @2,.--,@n41 > 0 and set 


ay Fe + An41 


= n+1 
t= ees 1. 
n 
By Bernoulli, 
Gite anir \ OF ay t+++ + An41 
n+l n+l 
Qi +-:: +a, 21+ (m+ 1) Qi+° +p, 1 
n n 
= 14 nS tT ett _ (41) 
ay +++: +n 
MAnt+1 
a, +---+an’ 


which translates into 


Ss ) 
n+1 


and the arithmetic-geometric mean inequality follows by induction. 
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Our first application is a beautiful result of Laguerre (see [8]) concerning 
the location of roots of polynomials. 


Theorem 1. Suppose all roots of the polynomial x” +an—,x"—!+-+-+ag 
are real. Then the roots are contained in the interval with the endpoints 


Qn-1 , n-1l 2n 
——— + o4 _ An—2 - 
n n 


@ Proof. Let y be one of the roots and yi,...,Y,—1 the others. Then 
the polynomial is (~ — y)(a — y1)--+(a@ — Yn—1). Thus by comparing 
coefficients 


Aan. = YY TY + Yn-1; 
Qn—-2 = Y(yr +-++ + Yn—-1) + So yyy: 
i<j 
and so 
n-1 


a1 — 2dn2—-y? = Dy. 
i=1 
By Cauchy’s inequality applied to (yi,...,Yn—1) and (1,...,1), 


(Qn-ity)? = (yt+yo+-+++Yn-1)? 
n-1 
< (n-l)S ox 
i=l 


(n —1)(an_1 — 2an-2 —y"), 


I 


or 
2G 2(n-—1 n—-2 
y" t ” ty + ( lie any < 0. 
n n 


Thus y (and hence all y;) lie between the two roots of the quadratic function, 
and these roots are our bounds. 


For our second application we start from a well-known elementary property 
of a parabola. Consider the parabola described by f(x) = 1 — x? between 
x = —land x = 1. We associate to f(a) the tangential triangle and the 
tangential rectangle as in the figure. 


We find that the shaded area 
1 
A= / (1 — «?)dx 
= 


is equal to 4, and the areas T and R of the triangle and rectangle are both 
equal to 2. Thus £ = 3 and & a 3. 

In a beautiful paper, Paul Erdés and Tibor Gallai asked what happens 
when f(a) is an arbitrary n-th degree real polynomial with f(a) > 0 for 


—1<a<_1,and f(—1) = f(1) = 0. The area A is then ie f(a)da. 
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Suppose that f(a) assumes in (—1,1) its maximum value at b, then R = 
2f(b). Computing the tangents at —1 and at 1, it is readily seen (see the 
box below) that 


2s, SER) 
> FO = FEN . 
respectively T = 0 for f’(1) = f’(—1) =0 


The tangential triangle 


The area T of the tangential triangle is precisely yo, where (20, yo) 
is the point of intersection of the two tangents. The equation of these 
tangents are y = f’(—1)(a + 1) and y = f’(1)(x — 1), hence 


fam (Sl) 


XO fi) = f’(-1)’ 
and thus 
nay (fF Ot FD POFCY 
yo = f (1) (ae - f'(-1) 1) i ? (1) ios a) 


In general, there are no nontrivial pounds for £ and a. To see this, take 
f(x) =1-—27". Then T = 2n, A= 523, 


R= 2 and a — which approaches 1 with n to infinity. 


But, as Erdés and Gallai showed, for polynomials which have only real 
roots such bounds do indeed exist. 


and thus .: > n. Similarly, 


Theorem 2. Let f(x) be a real polynomial of degree n > 2 with only real 
roots, such that f(a) > 0 for —1 <a < 1and f(—1) = f(1) = 0. Then 


and equality holds in both cases only for n = 2. 


Erd6és and Gallai established this result with an intricate induction proof. 
In the review of their paper, which appeared on the first page of the first 
issue of the Mathematical Reviews in 1940, George Polya explained how 
the first inequality can also be proved by the inequality of the arithmetic 
and geometric mean — a beautiful example of a conscientious review and 
a Book Proof at the same time. 


(Zo, Yo) 


Mathematical Reviews 


Pages tok? 


Erdis, P. and Griinwald, T. On polynomials with only real 
roots. Ann. of Math. 40, 537-548 (1939). [MF 93] 
Es sei f(x) cin Polynom mit nur reellen Wurzeln, 


rie pie 0, O<f(x)Sflu) fir -1<x<1, 
wobei —1<y<1, so dass w die Stelle des Maximums von 
f(x) im Interv: sr (—1, 1) bedeutet. Dann ist 


2s"(1f'(—1) 
FQ)-f(-1) 


sf S(x)dxS¥ -2f(s), 
“a 
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Hf Proof of 2T <A. Since f(a) has only real roots, and none of them in 
the open interval (—1, 1), it can be written — apart from a constant positive 
factor which cancels out in the end — in the form 


fe) = 1-2) IT@ — 2) HG +2) (3) 


with a; > 1, 8; > 1. Hence 


Az fo ~ 2) Ti —2) [6 + 22 


a J 


By making the substitution x > —z, we find that also 


A= fo — a") [[@ +2) IG —x«)dz, 


a Fi 
and hence by the inequality of the arithmetic and the geometric mean (note 
that all factors are > 0) 


fs[¢-?Te-9 T+ + 


A 


l| 


Vv Vv 
Sas [ee 
an gay 
| | 
8 8 
bo bo 
~~" ~~ 
a a 
— 2 
Rs -. 
| | 
1 8 
= = 
| Le] 
ke | 
oy & 
eee, 
Q 
8 
Q 
8 


Let us compute f’(1) and f’(—1). (We may assume f’(—1), f’(1) 4 0, 
since otherwise T’ = 0 and the inequality 2T < A becomes trivial.) By (3) 


we see 
= ail ee i ~1)] I Bj + 1), 
j 
and similarly 


fi(-l) = qT a; +1) Il; =); 


Hence we conclude 
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Applying now the inequality of the harmonic and the geometric mean 
to — f’(1) and f’(1), we arrive by (2) at the conclusion 


2 
Sat: 


2 _ 4 POC 
Bont peg | BPO) -FED 8 


A 


IV 


which is what we wanted to show. By analyzing the case of equality in 
all our inequalities the reader can easily supply the last statement of the 
theorem. 


The reader is invited to search for an equally inspired proof of the second 
inequality in Theorem 2. 


Well, analysis is inequalities after all, but here is an example from graph 
theory where the use of inequalities comes in quite unexpected. In Chap- 
ter 41 we will discuss Turan’s theorem. In the simplest case it takes on the 
following form. 


Theorem 3. Suppose G is a graph on n vertices without triangles. Then G 
has at most oe edges, and equality holds only when n is even and G is the 
complete bipartite graph Ky, /2n/2- 


@ First proof. This proof, using Cauchy’s inequality, is due to Mantel. Let 
V = {1,...,n} be the vertex set and E the edge set of G. By d; we denote 
the degree of i, hence }),-y dj = 2|E| (see page 199 in the chapter on 
double counting). Suppose 77 is an edge. Since G' has no triangles, we find 
d; + dj <n since no vertex is a neighbor of both 7 and j. 


It follows that 


SS (ditdj) < njZ\. 


ijEE 


Note that d; appears exactly d; times in the sum, so we get 


nE| > So (di +4;) = Dodi, 


ijeEB ieV 


and hence with Cauchy’s inequality applied to the vectors (d1,...,d,,) and 
A cveg il), 


2 


dj)” _ AB? 
n|B| 2 Ds d; 2 n n 
1EeV 
and the result follows. In the case of equality we find d; = d; for all 


a,j, and further d; = 5 (since dj + dj = n). Since G is triangle-free, 
G = Ky/2,n/2 is immediately seen from this. 
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@ Second proof. The following proof of Theorem 3, using the inequality 
of the arithmetic and the geometric mean, is a folklore Book Proof. Let a 
be the size of a largest independent set A, and set 6 = n — a. Since G is 
triangle-free, the neighbors of a vertex 7 form an independent set, and we 
infer d; < a for all 7. 


The set B = V\A of size 6 meets every edge of G. Counting the edges 
of G according to their endvertices in B, we obtain |E| < }0,-, di. The 
inequality of the arithmetic and geometric mean now yields 


2 


a+ B\2 n 
Bl < Dod < a8 < 5 ) ieee 


and again the case of equality is easily dealt with. 
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The fundamental theorem 
of algebra 


Every nonconstant polynomial with complex coefficients has at least 
one root in the field of complex numbers. 


Gauss called this theorem, for which he gave four different proofs, the 
“fundamental theorem of algebraic equations.” It is without doubt one of 
the milestones in the history of mathematics. As Reinhold Remmert writes 
in his pertinent survey: “It was the possibility of proving this theorem in the 
complex domain that, more than anything else, paved the way for a general 
recognition of complex numbers.” 

Some of the greatest names have contributed to the subject, from Gauss 
and Cauchy to Liouville and Laplace. An article of Netto and Le Vavasseur 
lists nearly a hundred proofs. The proof that we present is one of the most 
elegant and certainly the shortest. It follows an argument of d’ Alembert 
and Argand and uses only some elementary properties of polynomials and 
complex numbers. We are indebted to France Dacar and to Tord Sjédin for 
a polished version of the proof. Essentially the same argument appears also 
in the papers of Fefferman [3] and Redheffer [5], and doubtlessly in some 
others. 


We need three facts that one learns in a first-year calculus course. 


(A) Polynomial functions are continuous. 


(B) Any complex number of absolute value | has an m-th root for any 
m> 1. 


(C) Cauchy’s minimum principle: A continuous real-valued function f on 
a compact set S' assumes a minimum in S. 


Now let p(z) = 079 cz" be a complex polynomial of degree n > 1. As 
the first and decisive step we prove what is variously called d’ Alembert’s 
lemma or Argand’s inequality. 


Lemma. /f p(a) 4 0, then every disk D around a contains an interior 
point b with |p(b)| < |p(a)]. 


@ Proof. We first claim that without loss of generality we may assume 


that a = 0 and p(a) = 1. Indeed, if this is not the case, then we define 


another polynomial q(z) := ” eo ) which satisfies q(0) = 1. Now assume 
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Chapter 21 


Check for 
updates 


It has been commented upon that the 
“Fundamental theorem of algebra” is 
not really fundamental, that it is not 
necessarily a theorem since sometimes 
it serves as a definition, and that in its 
classical form it is not a result from 


algebra, but rather from analysis. 
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that every disk D of radius R around the origin contains a point b with 
|q(b)| < 1. Then the disk D, of radius R around the point a contains the 
point a + b such that |p(a + b)| < |p(a)| as claimed. 


We may thus assume that p(z) = 1 + c12 + cg2z7 +--+ + e,2", and letting 
m > 1 be the smallest index with c,, 4 0 we may write p(z) in the form 


p(z) =1t)ene +2 (engi tes tenet +) = 14+ ene” +r(z). 
In the first step we find 0 < p < 1 such that 
Ir(z)| < |en2™| <1 for all 0 < |z| < p. (1) 
To get the first inequality we note that for |z| < 1 
Ir(z)] < [2] Fema] + +++ + lend) < lem|l2"| = lem2™|, 


provided that 
Crh _ 
|em+1| Spates |en| 


0<|z| < : py. 


The second inequality holds if |z| < | Cm [7 7 =: pg; hence we conclude 
that (1) is valid for every p with O < p < min{p1, po, 1}. 

We come to our second ingredient, m-th roots of unity. Fix a constant p as 
in (1) with p < R, where R is the radius of the disk D around a = 0. Let ¢ 
be an m-th root of fect , where G,, is the complex conjugate of c,,, and set 
b := pC. We claim that b is a desired point in D with |p(b)| < 1. First of 
all, b is in D since |b| = p < R, and further by |¢m|? = Cm@m we have 


t,o" = =emp = —|em|p™. 
ml 


Looking at (1) we have |r(b)| < |¢mb™| = |em|p™ < 1, and hence 


Ip()| S< [1 + emb™| + |r()| = 1 — Jem|p™ + |r(b)] < 1, 


and we are done. 


—n 


The rest is easy. Clearly, p(z)z~" approaches the leading coefficient c,, 
of p(z) as |z| goes to infinity. Hence |p(z)| goes to infinity as well with 
|z| > co. Consequently, there exists Ry > 0 such that |p(z)| > |p(0)| for 
all points z on the circle {z : |z| = R,}. Furthermore, our third fact (C) 
tells us that in the compact set D) = {z : |z| < R,} the continuous real- 
valued function |p(z)| attains the minimum value at some point zy. Because 
of |p(z)| > |p(0)| for z on the boundary of D1, zo must lie in the interior. 
But by d’Alembert’s lemma this minimum value |p(zo)| must be 0 — and 
this is the whole proof. 
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“What’s up this time?” 


“Well, I’m shlepping “Proofs from the Book: 
100 proofs for the one for the Fundamental Theorem, 
Fundamental Theorem of Algebra” one for Quadratic Reciprocity!” 


One square Chapter 22 
and an odd number of triangles ® 


Check for 
updates 


Suppose we want to dissect a square into n triangles of equal area. When 
n is even, this is easily accomplished. For example, you could divide the 
horizontal sides into 5 segments of equal length and draw a diagonal in 
each of the + rectangles: 


But now assume n is odd. Already for n = 3 this causes problems, and 
after some experimentation you will probably come to think that it might 
not be possible. So let us pose the general problem: 


Is it possible to dissect a square into an odd number n of triangles 
of equal area? 


Now, this looks like a classical question of Euclidean geometry, and one 
could have guessed that surely the answer must have been known for a long 
time (if not to the Greeks). But when Fred Richman and John Thomas 
popularized the problem in the 1960s they found to their surprise that no 
one knew the answer or a reference where this would be discussed. 


Well, the answer is “no” not only for n = 3, but for any odd n. But how 

should one prove a result like this? By scaling we may, of course, restrict 

ourselves to the unit square with vertices (0,0), (1,0), (0,1), (1,1). Any | aie 

argument must therefore somehow make use of the fact that the area of the 

triangles in a dissection is 1, where n is odd. The following proof due There are dissections of squares into an 
to Paul Monsky, with initial work of John Thomas, is a stroke of genius odd number of triangles whose areas are 
and totally unexpected: It uses an algebraic tool, valuations, to construct nearly equal. 

a striking coloring of the plane, and combines this with some elegant and 


stunningly simple combinatorial reasonings. And what’s more: at present 
no other proof is known! 


Before we state the theorem let us prepare the ground by a quick study of 
valuations. Everybody is familiar with the absolute value function |2| on 
the rationals Q (or the reals R). It maps Q to the nonnegative reals such 
that for all x and y, 
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Example: |#|2 = 4, 

me => |2|o => =, and 

f+ Sle =e =la- FI 
4 Tv 7l2 281? ‘ 4° 7/2 
= 4 = max{|Zl2,|Fl2}- 


(i) |x| = Oif and only if « = 0, 
(ii) |xy| = |a||y|, and 
(iti) |v + y| < || +|y| (the triangle inequality). 


The triangle inequality makes R into a metric space and gives rise to the 
familiar notions of convergence. It was a great discovery around 1900 that 
besides the absolute value there are other natural “value functions” on Q 
that satisfy the conditions (i) to (iii). 

Let p be a prime number. Any rational number r # 0 can be written 
uniquely in the form 


rapt, keZ, (1) 
where a and b > O are relatively prime to p. Define the p-adic value 
Ip =P, [lp =0. (2) 


Conditions (i) and (ii) are obviously satisfied, and for (iii) we obtain the 
even stronger inequality 


(iii’) |x + y|p < max{|z|,,|y|p} (the non-Archimedean property). 


Indeed, let r = pk ¢ and s = p* 5, where we may assume that k > £, that 
is, |rlp =p * < p-* = |s|p. Then we get 


NO PO | Like | © 
Ir + slp pre +S), p'(p at Dp 
k-£ 
_,|p" “ad + bc = 
= p ESE] < pt = maxtlrlys isl}, 


since the denominator bd is relatively prime to p. We also see from this 
that 


(iv) |x + Y\p = max{|z|p, litle whenever lal # ly p? 


but we will prove below that this property is quite generally implied by (iii’). 
Any function v : K — Rso ona field K that satisfies 
(i) v(x) = 0 if and only if z = 0, 
(ii) u(xy) = v(x)v(y), and 
(iii!) v(a@ + 


for all x, y € K is called a non-Archimedean real valuation of K. 


y) < max{v(x),v(y)} (non-Archimedean property) 


For every such valuation v we have v(1) = v(1)u(1), hence v(1) = 1; and 
1 = v(1) = v((-1)(-1)) = [v(—1)]?, so v(—1) = 1. Thus from (ii) we 
get v(—x) = v(x) for all x and v(z—!) = v(x)! fora 4 0. 

Every field has the trivial valuation that maps every nonzero element onto 1, 
and if v is a real non-Archimedean valuation, then so is v’ for any positive 
real number t. So for Q we have the p-adic valuations and their powers, 
and a famous theorem of Ostrowski states that any nontrivial real non- 
Archimedean valuation of Q is of this form. 
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As announced, let us verify that the important property 


(iv) o(a@+y) = max{o(x),v(y)} if oe) A oly) 


holds for any non-Archimedean valuation. Indeed, suppose that we have 
u(x) < u(y). Then 


max{v(a+y),v(z)} = v(a+y) 
max{v(x),uv(y)} = u(y) 


voy) = v((@+y)-2) S 
< 


where (iii’) yields the inequalities, the first equality is clear, and the other 
two follow from u(x) < u(y). Thus u(% + y) = u(y) = max{v(x), o(y)}. 
Monsky’s beautiful approach to the square dissection problem used an ex- 
tension of the 2-adic valuation |z|2 to a valuation v of R, where “exten- 
sion” means that we require v(x”) = |xz|2 whenever x is in Q. Such a non- 
Archimedean real extension exists, but this is not standard algebra fare. In 
the following, we present Monsky’s argument in a version due to Hendrik 
Lenstra that requires much less; it only needs a valuation v that takes val- 
ues in an arbitrary “ordered group”, not necessarily in (Ro, -,<), such 
that v(5) > 1. The definition and the existence of such a valuation will be 
provided in the appendix to this chapter. 


Here we just note that any valuation with v(4) > 1 satisfies v(+) = 1 for 
odd integers n. Indeed, v($) > 1 means that v(2) < 1, and thus v(2k) < 1 
by (iii’) and induction on k. From this we get v(2k + 1) = 1 from (iv), and 


thus again (sect) = 1 from (ii). 


Monsky’s Theorem. /t is not possible to dissect a square into an 
odd number of triangles of equal area. 


@ Proof. In the following we construct a specific three-coloring of the 
plane with amazing properties. One of them is that the area of any trian- 
gle whose vertices have three different colors — which in the following is 
called a rainbow triangle — has a v-value larger than 1, so the area cannot 
be + for odd n. And then we verify that any dissection of the unit square 
must contain such a rainbow triangle, and the proof will be complete. 


The coloring of the points (x,y) of the real plane will be constructed by 
looking at the entries of the triple (a, y,1) that have the maximal value 
under the valuation v. This maximum may occur once or twice or even 
three times. The color (blue, or green, or red) will record the coordinate of 
(x, y, 1) in which the maximal v-value occurs first: 


blue if u(x) > 
(x,y) is colored ¢ green if u(x) < 
red if u(x) < 


The property (iv) together with 
vu(—2x) = v(z) also implies that 
v(at by bo +---+ be) = v(a) 
if v(a) > v(b;) for all i. 
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This assigns a unique color to each point in the plane. The figure in the 
margin shows the color for each point in the unit square whose coordinates 


are fractions of the form a 


The following statement is the first step to the proof. 


Lemma 1. For any blue point py = (Xv, yp), green point Pg = (Lg, Yq); 
and red point p, = (x, Yr), the v-value of the determinant 


is at least 1. 


@ Proof. The determinant is a sum of six terms. One of them is the product 
of the entries of the main diagonal, x,y, 1. By construction of the coloring 
each of the diagonal entries compared to the other entries in the row has a 
maximal v-value, so comparing with the last entry in each row (which is 1) 
we get 


v(xpygl) = v(xp)v(yg)v(Q1) > v(1)v()vo(1) = 1. 
Any of the other five summands of the determinant is a product of three 
matrix entries, one from each row (with a sign that as we know is irrelevant 
for the v-value). It picks at least one matrix entry below the main diagonal, 
whose v-value is strictly smaller than that of the diagonal entry in the same 
row, and at least one matrix entry above the main diagonal, whose v-value 
is not larger than that of the diagonal entry in the same row. Thus all of 
the five other summands of the determinant have a v-value that is strictly 
smaller than the summand corresponding to the main diagonal. Thus by 
property (iv) of non-Archimedean valuations, we find that the v-value of the 
determinant is given by the summand corresponding to the main diagonal, 


tT Yo 1 
wider) 2g wy L |). = egy) > a 
te Yr I 


Corollary. Any line of the plane receives at most two different colors. 
The area of a rainbow triangle cannot be 0, and it cannot be 1 for odd n. 


@ Proof. The area of the triangle with vertices at a blue point p,, a green 
point p,, and a red point p,. is the absolute value of 


$((xp = Er) (Yg Yr) (Lg Lr (Ye — Yr), 
which up to the sign is half the determinant of Lemma 1. 
The three points cannot lie on a line since the determinant cannot be 0, as 
v(0) = 0. The area of the triangle cannot be i, since in this case we would 
get +2 for the determinant, thus 
v(t2) = v(§)7'o(4) < 1 


nm nm 


CoccccccceQecceccccee because of v($) > 1 and v(+) = 1, contradicting Lemma 1. 
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And why did we construct this coloring? Because we are now going to 
show that in any dissection of the unit square S = [0,1]? into triangles 
(equal-sized or not!) there must always be a rainbow triangle, which ac- 
cording to the corollary cannot have area + for odd n. Thus the following 
lemma will complete the proof of Monsky’s theorem. 


Lemma 2. Every dissection of the unit square S = [0,1]? into finitely 
many triangles contains an odd number of rainbow triangles, and thus at 
least one. 


Proof. The following counting argument is truly inspired. The idea is 
due to Emanuel Sperner, and will reappear with “Sperner’s Lemma” in 


VasN 


Paw 


Consider the segments between neighboring vertices in a given dissection. 
A segment is called a red-blue segment if one endpoint is red and the other 
is blue. For the example in the figure, the red-blue segments are drawn in 
purple. 

We make two observations, repeatedly using the fact from the corollary that 
on any line there can be points of at most two colors. 

(A) The bottom line of the square contains an odd number of red-blue seg- 
ments, since (0,0) is red and (1, 0) is blue, and all vertices in between are 
red or blue. So on the walk from the red end to the blue end of the bottom 
line, there must be an odd number of changes between red and blue. The 
other boundary lines of the square contain no red-blue segments. 

(B) If a triangle 7’ has at most two colors at its vertices, then it contains 
an even number of red-blue segments on its boundary. However, every 
rainbow triangle has an odd number of red-blue segments on its boundary. 
Indeed, there is an odd number of red-blue segments between a red vertex 
and a blue vertex of a triangle, but an even number (if any) between any 
vertices with a different color combination. Thus a rainbow triangle has an 
odd number of red-blue segments in its boundary, while any other triangle 
has an even number (two or zero) of vertex pairs with the color combination 
red and blue. 


Now let us count the boundary red-blue segments summed over all trian- 
gles in the dissection. Since every red-blue segment in the interior of the 
square is counted twice, and there is an odd number on the boundary of S, 
this count is odd. Hence we conclude from (B) that there must be an odd 
number of rainbow triangles. 
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Appendix: Extending valuations 


It is not at all obvious that an extension of a non-Archimedean real valuation 
from one field to a larger one is always possible. But it can be done, not only 
from Q to R, but generally from any field K to a field L that contains K. 
(This is known as “Chevalley’s theorem”; see for example the book by 
Jacobson [1].) 

In the following, we establish much less — but enough for our application 
to odd dissections. Indeed, in our proof for Monsky’s theorem we have not 
used the addition for values of v : R — Rso; we have used only the mul- 
tiplication and the order on Rs 9. Hence for our argument it is sufficient if 
the nonzero values of v lie in a (multiplicatively written) ordered abelian 
group (G, -,<). That is, the elements of G are linearly ordered, and a < b 
in G implies ac < bc for any a,b,c € G. As we assume that the group 
is written multiplicatively, the neutral element of G' is denoted by 1. For 
the definition of a valuation, we adjoin a special element 0 with the under- 
standing that 0 ¢ G, 0a = 0, and 0 < ahold for all a € G. Of course, the 
prime example of an ordered abelian group is (Rso, -,<) with the usual 
linear order, and the prime example for {0} U G is (Rso,-). 


Definition. Let K be a field. A non-Archimedean valuation v with values 
in an ordered abelian group G' is a map v: K — {0}UG with 

(i) v 

(ii) v 

iii’) v 


(iv) v 


cr) =0<—>2=0, 
ry) = v(x)v(y), 
y) < max{v(x), v(y)}, and 


+ 
+y) = max{v(zx), v(y)} whenever u(x) 4 u(y) 


8 8 


—~S ~~ 


forallz,ye K. 


The fourth condition in this description is again implied by the first three. 
And among the simple consequences we record that if u(x) < 1, a 4 0, 
then v(z—+) = v(a)-1 > 1. 

So here is what we will establish: 


Theorem. The field of real numbers R has a non-Archimedean valuation 
to an ordered abelian group 


v:R—-> {0}UG 
such that v($) > 1. 


@ Proof. We first relate any valuation on a field to a subring of the field. 
(All the subrings that we consider contain 1.) Suppose v : K > {0} UG 
is a valuation; let 


R= {xe K:v(2) < 1}, U = {x € K: v(x) = 1}. 


It is immediate that R is a subring of K, called the valuation ring corre- 
sponding to v. Furthermore, v(za~') = v(1) = 1 implies that v(x) = 1 
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if and only if v(2~!) = 1. Thus U is the set of units (invertible elements) 
of R. In particular, U is a subgroup of K™, where we write K* := K \ {0} 
for the multiplicative group of K. Finally, with R~' :-= {a-! : x 4 O} 
we have K = RU R7!. Indeed, if x ¢ R then v(x) > 1 and therefore 
v(a—t) < 1, thus 2! € R. The property K = RU R™ already charac- 
terizes all possible valuation rings in a given field. 


Lemma. A proper subring R C K is a valuation ring with respect to some 
valuation v into some ordered group G if and only if K = RU R71. 


@ Proof. We have seen one direction. Suppose now K = RU R~!. How 
should we construct the group G? If v : K — {0} UG is a valuation 
corresponding to R, then v(x) < v(y) holds if and only if v(zy~*) < 1, 
that is, if and only if zy~! € R\ U. Also, v(x) = v(y) if and only if 
xy! €U, or xU = yU as cosets in the factor group K* /U. 

Hence the natural way to proceed goes as follows. Take the quotient group 
G := K~*/U, and define an order relation on G by setting 


aU <yU := > ay |e R\U. 


It is a nice exercise to check that this indeed makes G into an ordered group. 
The map v : kK — {0} U G is then defined in the most natural way: 


v(0) :=0, and v(x) :=aU fore £0. 


It is easy to verify conditions (i) to (iii’) for v, and that R is the valuation 
ring corresponding to v. 


In order to establish the theorem, it thus suffices to find a valuation ring 
B CR such that 5 ¢ B. 

Claim. Any inclusion-maximal subring B C R with the property $ ¢B 
is a valuation ring. 

First we should perhaps note that a maximal subring B C R with the prop- 
erty $ ¢ B exists. This is not quite trivial — but it does follow with a 
routine application of Zorn’s lemma, which is reviewed in the box. Indeed, 
if we have an ascending chain of subrings B; C R that don’t contain +, 
then this chain has an upper bound, given by the union of all the subrings 


B;, which again is a subring and does not contain s. 


Zorn’s Lemma 


The Lemma of Zorn is of fundamental importance in algebra and 
other parts of mathematics when one wants to construct maximal 
structures. It also plays a decisive role in the logical foundations of 
mathematics. 


Lemma. Suppose P< is a nonempty partially ordered set with the 
property that every ascending chain (a;)< has an upper bound b, 
such that a; < b for alli. Then P< contains a maximal element M, 


meaning that there isnoc € P with M <c. 


Z C Ris such a subring with $ ¢ Z, 
but it is not maximal. 
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To prove the Claim, let us assume that B C R is a maximal subring not 
containing s. If B is not a valuation ring, then there is some element a € 
R\(B U B~!). We denote by B[a] the subring generated by B U a, that 
is, the set of all real numbers that can be written as polynomials in a with 
coefficients in B. Let 2B C B be the subset of all elements of the form 20, 
for b € B. Now 2B is a subset of B, so we have 2B[a] C Bla] and 
2Bla~'] C BlaW']. If we had 2Bla] 4 Bla} or 2B[a~*] # Bla~*], 
then due to 1 € B this would imply that 5 ¢ Bla] resp. } ¢ Bla~'], 
contradicting the maximality of B C Rasa subring that does not contain 4. 
Thus we get that 2B[a] = B[a] and 2B[a~*] = B[a~']. This implies that 
1 € Bcan be written in the form 


1 = 2uj9+2ujat+---+2u,0™ withu;€ B, (1) 


and similarly as 


1 = 2u+2uja7t+---+2u,a~" with v; € B, (2) 


which after multiplication by a” and subtraction of 2u9a” from both sides 
yields 


(1—2u9)a” = Qua) +--+ + Wy 1a + 2p. (3) 


Let us assume that these representations are chosen such that m and n are as 
small as possible. We may also assume that m > n, otherwise we exchange 
a with a, and (1) with (2). 

Now multiply (1) by 1 — 2v9 and add 2vo on both sides of the equation, to 
get 


1 = 2(uo(1 — 2v0) + vo) + 2ui (1 — 2v9)a + +++ + 2m (1 — 2v0)a”. 


But if in this equation we substitute for the term (1—2vo)a’”” the expression 
given by equation (3) multiplied by a’”~”, then this results in an equation 
that expresses 1 € B as a polynomial in 2B[a] of degree at most m — 1. 
This contradiction to the minimality of m establishes the Claim. 


References 


[1] N. JACOBSON: Lectures in Abstract Algebra, Part III: Theory of Fields and 
Galois Theory, Graduate Texts in Mathematics 32, Springer, New York 1975. 


[2] P. MONSKyY: On dividing a square into triangles, Amer. Math. Monthly 77 
(1970), 161-164. 


[3] F. RICHMAN & J. THOMAS: Problem 5471, Amer. Math. Monthly 74 (1967), 
329. 


[4] S. K. STEIN & S. SZABO : Algebra and Tiling: Homomorphisms in the Ser- 
vice of Geometry, Carus Math. Monographs 25, MAA, Washington DC 1994. 


[5] J. THOMAS: A dissection problem, Math. Magazine 41 (1968), 187-190. 


A theorem of Polya on polynomials 


Among the many contributions of George Polya to analysis, the following 
has always been Erdés’ favorite, both for the surprising result and for the 
beauty of its proof. Suppose that 


Oe 


is a complex polynomial of degree n > 1 with leading coefficient 1. Asso- 
ciate with f(z) the set 


C = {z€C: |f(z)| < 2}, 


that is, C is the set of points which are mapped under f into the circle of 
radius 2 around the origin in the complex plane. So for n = 1 the domain C 
is just a circular disk of diameter 4. 

By an astoundingly simple argument, Polya revealed the following beauti- 
ful property of this set C: 


Take any line L in the complex plane and consider the orthogonal 
projection Cy, of the set C onto L. Then the total length of any such 
projection never exceeds 4. 


What do we mean by the total length of the projection C; being at most 4? 
We will see that Cy, is a finite union of disjoint intervals [,,..., Z;, and the 
condition means that €([,)+---+(1;) < 4, where @(J;) is the usual length 
of an interval. 

By rotating the plane we see that it suffices to consider the case when L is 
the real axis of the complex plane. With these comments in mind, let us 
state Pélya’s result. 


Theorem 1. Let f(z) be a complex polynomial of degree at least 1 and 
leading coefficient 1. SetC = {z € C: |f(z)| < 2} and let R be the 
orthogonal projection of C onto the real axis. Then there are intervals 
I,,...,L, 0n the real line which together cover R and satisfy 


Ve eee a 0A eae 


Clearly the bound of 4 in the theorem is attained for nm = 1. To get more 
of a feeling for the problem let us look at the polynomial f(z) = z? — 2, 
which also attains the bound of 4. If z = x + zy is acomplex number, then 
x is its orthogonal projection onto the real line. Hence 


R={xER:ax+iy €C for some y}. 
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Ck = Gp + ibp 
bp | -- 


Pavnuty Chebyshev on a Soviet stamp 
from 1946 


The reader can easily prove that for f(z) = z? — 2 we have x + iy € C if 
and only if 
(a? +y?)" < 42" —y’). 
It follows that «* < (a? + y?)? < 4x?, and thus 2? < 4, that is, |z| < 2. 
On the other hand, any z = x € R with |x| < 2 satisfies |z? — 2| < 2, and 
we find that ? is precisely the interval [—2, 2] of length 4. 
As a first step towards the proof write f(z) = (z—c1) ---(z—¢n) with c, = 
ay + iby, and consider the real polynomial p(x) = (a — aj)--- (a — ay). 
Let z = x+y €C, then by the theorem of Pythagoras 
|x — ax|? + ly — bx|? = |2 — en? 

and hence |x — az| < |z — cx| for all k, that is, 

Ip(x)| = |e —ay|-+-|e@—an| < |z—ea]---|2— en] = [F(z] <2. 


Thus we find that R is contained in the set P = {x € R: |p(x)| < 2}, 
and if we can show that this latter set is covered by intervals of total length 
at most 4, then we are done. Accordingly, our main Theorem 1 will be a 
consequence of the following result. 


Theorem 2. Let p(x) be a real polynomial of degree n > 1 with leading 
coefficient 1, and all roots real. Then the set P = {x € R: |p(a)| < 2} 
can be covered by intervals of total length at most 4. 


As Pélya shows in his paper [2], Theorem 2 is, in turn, a consequence 
of the following famous result due to Chebyshev. To make this chapter 
self-contained, we have included a proof in the appendix (following the 
beautiful exposition by Pélya and Szeg6). 


Chebyshev’s Theorem. 
Let p(a) be a real polynomial of degree n > 1 with leading coefficient 1. 
Then 1 
> ——. 
mee IP) Ga 


Let us first note the following immediate consequence. 


Corollary. Let p(x) be a real polynomial of degree n > 1 with leading 
coefficient 1, and suppose that |p(«)| < 2 for all x in the interval {a, b]. 
Thenb—a< A. 


@ Proof. Consider the substitution y = ;2-( — a) — 1. This maps the 
x-interval [a, b] onto the y-interval [—1, 1]. The corresponding polynomial 


ay) = p(*3*(y +1) +4) 
has leading coefficient (25*)” and satisfies 


_max |a(y)| = max, [p(2)]. 
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By Chebyshev’s theorem we deduce 


> > b-a\n_ 1 _— b-—a\n 
22 ane IP (| AS) eer = 2°75)", 


and thus b — a < 4, as desired. 


This corollary brings us already very close to the statement of Theorem 2. 
If the set P = {a : |p(x)| < 2} is an interval, then the length of P is 
at most 4. The set P may, however, not be an interval, as in the example 
depicted here, where P consists of two intervals. 

What can we say about P? Since p(x) is a continuous function, we know 
at any rate that P is the union of disjoint closed intervals [,, J2,..., and 
that p(a) assumes the value 2 or —2 at each endpoint of an interval I;. This 
implies that there are only finitely many intervals [,,..., I, since p(a) can 
assume any value only finitely often. 

Pélya’s wonderful idea was to construct another polynomial p(x) of degree 
n, again with leading coefficient 1, such that P = {x : |f(x)| < 2} is an 
interval of length at least €(I,) + --- + €(J;). The corollary then proves 


(11) +--+ + €() < &(P) < 4, and we are done. 


@ Proof of Theorem 2. Consider p(x) = (a — a,)---(% — ay) with 
P={x ER: |p(x)| < 2} =I, U---U hj, where we arrange the intervals 
I; such that J; is the leftmost and J; the rightmost interval. First we claim 
that any interval I; contains a root of p(x). We know that p(x) assumes the 
values 2 or —2 at the endpoints of J;. If one value is 2 and the other —2, 
then there is certainly a root in J;. So assume p(a) = 2 at both endpoints 
(the case —2 being analogous). Suppose b € J; is a point where p(z) 
assumes its minimum in J;. Then p’(b) = 0 and p’’(b) > 0. If p’”(b) = 0, 
then b is a multiple root of p’(x), and hence a root of p(x) by Fact 1 from 
the box on the next page. If, on the other hand, p’’(b) > 0, then we deduce 
p(b) < 0 from Fact 2 from the same box. Hence either p(b) = 0, and we 
have our root, or p(b) < 0, and we obtain a root in the interval from 6 to 
either endpoint of J;. 

Here is the final idea of the proof. Let /;,..., J; be the intervals as before, 
and suppose the rightmost interval J; contains m roots of p(x), counted 
with their multiplicities. If m = n, then J; is the only interval (by what 
we just proved), and we are finished. So assume m < n, and let d be 
the distance between J;_, and J; as in the figure. Let b,,...,b,, be the 
roots of p(x) which lie in J; and c,,...Cy—m the remaining roots. We now 
write p(a) = q(x)r(a) where g(a) = (a — b1)--- (a — by) and r(x) = 
(a — cy) +++ (% — Cy—m), and set p(x) = g(a + d)r(ax). The polynomial 
pi (x) is again of degree n with leading coefficient 1. For a € [,U---Ul_1 
we have |x + d — b;| < |a — b,| for all 2, and hence |q(x + d)| < |q(«)|. It 
follows that 


\pi(z)| < |p(a)| < 2 for © €,U-+-UTe-1. 
If, on the other hand, x € J;, then we find |r(xz — d)| < |r(a)| and thus 
lpi(a—d)| = |a(a)[Ir(@—@)| < |p(@)| < 2, 


For the polynomial p(a) = «?(a — 3) 
we get P = [1—V3, 1JU[14- V3, & 3.2] 
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which means that I; — d C P; = {x : |pi(a)| < 2}. 
In summary, we see that P; contains J, U---UZ;_,U (J; — d) and hence has 
total length at least as large as P. Notice now that with the passage from 
p(a) to pi(a) the intervals [;_, and I; — d merge into a single interval. 
We conclude that the intervals J;,..., J, of pi(a) making up P; have total 
length at least €(J,) + +--+ €(d;), and that the rightmost interval J, con- 
tains more than m roots of p;(a). Repeating this procedure at most t — 1 
times, we finally arrive at a polynomial p(x) with P = {x : |p(x)| < 2} 
being an interval of length £(P) > (I) +--- + (i), and the proof is 
complete. 


Two facts about polynomials with real roots 
Let p(x) be a nonconstant polynomial with only real roots. 
Fact 1. [fb is a multiple root of p' (x), then b is also a root of p(x). 


M@ Proof. Let b; < --- < 6, be the roots of p(x) with multiplicities 
$1,-++8rs )j=1 87 = 7. From p(x) = (x — b;)*#h(x) we infer 
that b; is a root of p’(a) if s; > 2, and the multiplicity of b; in p’(x) 
is s; — 1. Furthermore, there is a root of p'(a) between b; and bo, 
another root between ba and b3, ..., and one between b,_; and 6,., 
and all these roots must be single roots, since SS ig a stip 1) 
counts already up to the degree n — 1 of p/(a). Consequently, the 
multiple roots of p'(a) can only occur among the roots of p(x). 


Fact 2. We have p'(x)? > p(x)p" (x) forall x ER. 


M@ Proof. If x = a, is a root of p(x), then there is nothing to show. 
Assume then z is not a root. The product rule of differentiation yields 


, a : p(2) at is ale) = 5 : 
p(t) = Sere ae p(t) y 


Differentiating this again we have 


Ny 
3 
e 
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Appendix: Chebyshev’s theorem 


Theorem. Let p(x) be a real polynomial of degree n > 1 with leading 


coefficient 1. Then 
max, [p(2)| > 57 
—1<a<1 — gn-l’ 
Before we start, let us look at some examples where we have equality. The 
margin depicts the graphs of polynomials of degrees 1, 2 and 3, where we 
have equality in each case. Indeed, we will see that for every degree there 
is precisely one polynomial with equality in Chebyshev’s theorem. 


@ Proof. Consider a real polynomial p(x) = x” + a,_12"" 1 +--+ + a9 
with leading coefficient 1. Since we are interested in the range —1 < x < 1, 
we set z = cos and denote by g(W) := p(cos ¥) the resulting polynomial 
in cos¥, 


g(9) = (cos 9)” + an_1(cos9)”~* +--+ +9. (1) 


The proof proceeds now in the following two steps which are both classical 
results and interesting in their own right. 


(A) We express g(v) as a so-called cosine polynomial, that is, a polynomial 
of the form 


g(0) = bp cosnd + bn_1 cos(n — 1)9+--++b,cos0+ bg (2) 
with b; € R, and show that its leading coefficient is b, = sr - 


(B) Given any cosine polynomial h(v) of order n (meaning that ,, is the 
highest nonvanishing coefficient) 


h(V) = Ap cosnd + An—1 cos(n — 1)0 + +--+ Ag, (3) 


we show |A,| < max |h(v) 
the theorem. 


, which when applied to g(W) will then prove 


Proof of (A). To pass from (1) to the representation (2), we have to ex- 
press all powers (cos ¥)* as cosine polynomials. For example, the addition 
theorem for the cosine gives 


cos29 = cos?) —sin? 9 = 2cos?v—1, 


so that cos? 0 = 5 cos 20 + 5. To do this for an arbitrary power (cos J)" 
we go into the complex numbers, via the relation e’” = cosx + isin a. 
The e’” are the complex numbers of absolute value 1 (see the box on com- 
plex unit roots on page 37). In particular, this yields 


ee”? = cosnd +isinnd. (4) 
On the other hand, 


em — (ec) — (cosd+isind)”. (5) 


The polynomials p(x) = 2, po(x) = 


aa Sa oa : 
x” — 5 and p3(x) = 2° — $a achieve 


equality in Chebyshev’s theorem. 
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Vso (a) = 2” * holds for n > 0: 
Every subset of {1,2,...,n—1} yields 
an even sized subset of {1,2,...,} if 
we add the element n “if needed.” 


Equating the real parts in (4) and (5) we obtain by i44+? = —1, i# = 1 and 
sin? 6 = 1 — cos? 0 


cosn} = + i (cos 9)"—4"(1 — cos? 3)? 


£>0 
(6) 
n 
- a ia a. ° (cos 8)"—*4-?(1 — cos? 9)24t1, 
£>0 
We conclude that cos nv is a polynomial in cos V, 
cosn? = c¢p(cos 8)” + cp_1(cos9)”—* +++» + 9. (7) 


From (6) we obtain for the highest coefficient 


= (E+E (wea) = 


e>0 


Now we turn our argument around. Assuming by induction that for k < n, 
(cos 7)" can be expressed as a cosine polynomial of order k, we infer from 
(7) that (cos¥)" can be written as a cosine polynomial of order n with 
leading coefficient b,, = ao: 
Proof of (B). Let h(v) be a cosine polynomial of order n as in (3), and 
assume without loss of generality ,, > 0. Now we set m(V) := Ap, cos nd 
and find 

m(Er) = (-1)¥An for k =0,1,...,n. 
Suppose, for a proof by contradiction, that max |h(V)| < An. Then 

m(£n)—h(Er) = (-1)*An — h(En) 


is positive for even k and negative for odd k in the range 0 < k < n. We 
conclude that m(v) — h() has at least n roots in the interval [0,7]. But 
this cannot be since m(W) — h(#) is a cosine polynomial of order n — 1, 
and thus has at most n — 1 roots. 


The proof of (B) and thus of Chebyshev’s theorem is complete. 


The energetic reader is now invited to complete the analysis, showing that 
Gn(V) = a cos nv is the only cosine polynomial of order n with leading 
coefficient 1 that achieves the equality max|g(¥)| = s+. 

The polynomials T,,(2) = cosn¥, x = cos¥, are called the Chebyshev 
polynomials (of the first kind); thus 54+T,,(«) is the unique monic poly- 
nomial of degree n where equality holds in Chebyshev’s theorem. 
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Van der Waerden’s 
permanent conjecture 


Suppose M = (m,;) is a real n x n matrix. If in the usual representa- 
tion of the determinant we omit the signs of the permutations, we get the 
permanent per M, 


perM = ™M16(1)™20(2) °° * ™no(n)s 


where o runs through all permutations of {1,2,...,n}. 

In contrast to the determinant, which can be quickly calculated (e.g. by 
Gaussian elimination), computation of the permanent is provably difficult. 
Therefore a lot of research about permanents concerned bounds and 
approximation; the book by Minc [7] gives an excellent overview of the 
subject. 

We consider in this chapter the most famous theorem about permanents 
and its fabulous recent proof. A real matrix M = (m,,;) is called doubly 
stochastic if its entries are nonnegative, with each row sum and column sum 
equal to 1. In 1926 Bartel L. van der Waerden asked whether 


! 
perM > = 


ne 

holds for every doubly stochastic n x n matrix, the minimum being attained 
only by the matrix M = (mj;), where m;; = + for all i and j. 

This “Van der Waerden conjecture” remained unsolved for over fifty years, 
until it was confirmed (more or less independently and more or less simul- 
taneously) by G. P. Egorychev and D. I. Falikman in 1981. The paper [5] 
by Jacobus van Lint gives a very readable account of the history of the 
conjecture and the proofs. 


The arguments of Egorychev and Falikman were rather involved, so it was 
a great surprise when in 2007 Leonid Gurvits presented a short, elegant, 
and completely different proof. In fact, he proved a stronger statement that 
included other previous results in this area as well. 


Theorem. Let M = (m,;) be a doubly stochastic n x n matrix. 


Then 
n! 


prM > — 
ne 


Es 


and equality holds if and only if mj; = 4 for alli and 3. 
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Bartel Leendert van der Waerden 


However, in 1969 Van der Waerden told 
his compatriot Van Lint that he had 
never heard of such a conjecture nor of 
his name being attached to it... 


i i 

n n ! 
per : nn =a 

Ml = 
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For example, for M = ( : ) we get 
pm (#1, £2) 
(ax1 + bx2)(cx1 + dx2) 
— acxy + (ad + be)xix2 + bdx3. 


For the polynomial in the margin above, 
we obtain qi(z1) = (ad + bc)a, and 
go = ad + be. 


For our presentation of the proof we follow closely the beautiful exposition 
of Gurvits’ work by Monique Laurent and Alexander Schrijver in [4]. 


As first step let us translate matrices into polynomials. To every n x n 
matrix M = (m,,) we associate the polynomial pyy(x) € R[z1,...,2n], 


n n 
pM(£) = pm(@1,---,2n) = IL (2 mses). 
1 j=l 


i= 


Since every term picks a variable from each row, pjy(x) is homogeneous 
of degree n, meaning that every monomial at ---akn has total degree 
ky +--+ +k, = n. Note that py; (a) may be the zero-polynomial with 
all coeffcients equal to 0, which happens for example if M/ has a zero row. 
It is convenient to include this case and still regard pjy(x) = 0 as homoge- 
neous of degree n, when the set of variables is clear. 


Next we define for p(x) € R{w1,...,%p| its derivative in xp: 
p(x) 
! : _ 
p (#1,---;En—1) : Dee les 


Observe that if p is homogeneous of degree n in n variables, then p’ is 
homogeneous of degree n — 1 in n — 1 variables. Indeed, since exactly the 
monomials of p(a) that are linear in x,, survive in p’, the degree decreases 
by 1. 


In general, we define for 2 = 0,1,...,n 
Ar—-) x 
qilx1, tee 25) = nea ) 
O08: OXj41 Ee ee er 
In this way we get a chain (gn, @n—1,---, 90), Where gn = p and qi_1 = q; 


for 1 <2 <n, and finally qo is the coefficient of 7122 --+ 2, in p. Further- 
more, if p is homogeneous of degree n, then g; is homogeneous of degree 7. 


Let us look at the chain generated by pas(x), 
PM(X) = Gny+++sGir-++1Q- 
The following two facts will be important: 
A. per M is the coefficient of £122 +++ Xp, iN Qn, thus qo = per M. 
This holds by the definition of the permanent. 


B. Fori =1,...,n we have 
deg; di < min{i, Am (t)}, (1) 
where deg ; q; denotes the degree of x; in qi(a1,..-,@;) and Ay (1) records 


the number of nonzero entries in the ith column of M. 


Indeed, we have deg; g; < 7, because q; is homogeneous of degree 2, while 
deg; gi < deg; dn < Axs(2) is clear from the definition of pj, (zx). 
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Here comes the main idea of the proof: We associate a parameter to every 
polynomial p and bound it from below when passing from p to p’. 

Before going on, let us fix some notation. We will let R denote the 
nonnegative reals, and p(x) € R+[x1,...,2@»] means that all coefficents 
of p(x) are nonnegative. For a complex number z € C, let Re(z) and 
Im(z) be the real and the imaginary part, respectively. Let Cy = {z € C: 
Re(z) > 0} and C44 = {z € C: Re(z) > 0} denote closed and open 
right complex half-planes. The notation extends to R’ and C’, etc. Thus, 
for example, z = (21,...,2n) € C4, holds if Re(z;) > 0 for all 7. 

For every polynomial p(x) € R4[a1,...,%n] define the capacity cap(p) 
by 


cap(p) := inf {p(x) :7e Ri, We - i}, 
i=1 


In particular, cap(p) > 0 as p has only nonnegative coefficients, and if p is 
the constant polynomial p(x) = c, then cap(p) = c. 


We also need the function g : No — R, defined by g(0) := 1 and 


g(k) = (fy) for k>1. 


Using 1 + x < e® twice, which holds strictly except for « = 0, we get 


2 — 
ne = —_ (= a. 1 . eo Fit era FH) 24 
g _ 


for k > 1. Thus g is non-increasing, g(0) = g(1) > g(2) >--- 
Call a polynomial p(x) € R[x1,...,£,]| H-stable if it has no roots in C7) , . 


The following result is the key step. We postpone its proof for the moment 
and show first how it immediately implies the Theorem. 


Gurvits’ Proposition. 

Tf p(w) € Ry [a@1,...,&n] is H-stable and homogeneous of degree n, then 
either p' = 0, or p' is H-stable and homogeneous of degree n — 1. In either 
case 


cap(p") 2 cap(p) - g(deg,, p). 
M Proof of the Theorem. Let A/ = (m,,;) be a doubly stochastic n x n 
matrix. We already know that pys(a) is homogeneous of degree n. 
Claim 1. py,(x) is H-stable. 
Suppose z is a root of pyr(x). From pyr (x) = [Tey (yi mMijzt;) = Oit 
follows that }7'_, mijxj; = 0 for some i, thus }>"'_, m;;Re(x;) = 0. But 
this precludes x € C”,, since mje > 0 for some &. 
Claim 2. cap(par) = 1. 


Take any « € R" with Ga x; = 1. By the inequality of the arithmetic 
and the geometric mean (see Chapter 20) we have 


Writing g(k) = (1 — +)*~1, we see 


that limg.00 g(k) = 


Ole xR 


Recall for this that mi; > 0 
and 7", mij = 1. 
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The AM-GM inequality: For 
Q1,-+-,Qn,; P1,---,;Dn © R+ with 
yor, Di = 1 we get 


nr 
S PL Pn 
jay DIU 2 Pere 9 raider 


n n n n n n 
Mig Mig 
ote) = T[(Scmues) 2 TTL? = TTL 
i=1 j=1 i=1j=1 j=li=l 
Teneo Theat 
IIe; ITs =1, 
j=1 j=1 


and thus cap(pas) > 1. 
On the other hand, 


Hag Aee Ay II (3m) - It = i 


which proves Claim 2. 


Since pys(x) is H-stable, we may apply Gurvits’ Proposition repeatedly to 
conclude that all the polynomials g; are H-stable, and obtain for each 7 that 
cap(qi-1) = cap(qi) g(deg; i) = cap(qi) g(min{i, Aw(i)}), 


where the second inequality follows from (1) and the fact that g is non- 
increasing. 
Iterating (2) we get, starting with cap(paz) = 1, that 


prM=q > [[ o( mini, Au (@}) > [[s@ (3) 
i=l i=l 
7 (=) - (eo a 
= | i ~ LG pe 
i=1 i=1 


which is our desired inequality. 


It remains to prove the uniqueness part. Suppose that per M = a, where 
we may assume that n > 2. From the fact that we have equality in (3), we 
conclude that i < Aj,(é) for all i, and hence n = Aqs(n). By symmetry, 
we conclude that all entries of MM are nonzero. Thus it suffices to prove, 
again by symmetry, that all entries in the /ast column are equal to 1. 

Since we have equality in Gurvits’s Proposition applied to pay and p4,,, and 
since cap(par) = 1, we find 


: j 7 ; _ = n= | n-1 
inf p(y) = cape) = g(r) = (S—)" 


where y ranges over all y € R"~' with Te yj = 1. Take any such y. 
In the following chain of inequalities the indices 7 and k range from 1 to n, 


while 7 goes from 1 ton — 1. From 


n n 


pm («) = II ( mije) 


i=1 j= 
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we infer that 


= Lome TT (Sma) 


r=(y,0) if#k jj 


and thus obtain the following chain: 


Srmin TT (Dome) 


ifk  j 


ITH (Sma) 


k i#k jj 


= TTT (Som)"” 


i k#i 
— (Sn) 
J 


4 


= | l(a — min) >~ 


z J 


= [e-em I] uf | 
J 


Pu(y) 


z 
IVE II 
K 


Mig 


1-Min 
ji 


1 — mj, 


4 


= [fmt PTTL 
a j 74 

= [[a =i) II Yj 
a el) 


= [[a = fey —aE 


n—1\7-1 
a one ae 
n 
For the last inequality in this chain we exploit the log-convexity of the 
function «*” for x > 0. For this recall that a real function f is convex 
if +(f(@1) +-+-+f(en)) > f(2***-); a function f (2x) is log-convex 
if log f(x) is convex. Then 45>, log f(x;) > log f(4#+**+), or 
f(ti)-+: f(an) = f(3#ts)" For the function x” we have 


ee? eee > (= ae ee = 
n 
with equality if and only if 7, = 72 =--- = Xp. In ourcase 7; = 1— Min, 


with 71 +...+%, =n-—1, thus (qbesten Pe = (cS ae 


n 


And here is the punch line: Since this chain of inequalities holds for every 
such y, and since inf p4,(y) = aa, the last inequality (which is 
independent of y) must be an equality, and from this we conclude that 
1l-— min =-++: =1—Mnpyn, that is, m1, = 


= Mig Sa 


Here we make use of the Leibniz rule 
for differentiating products: 


(fife-++ fn)’ =o, fe TL fie 
ith 


Note that mi, 4 1, since n > 2 and the 
i-th row has no zeros. 


i OF 1 ae 2 


The function f(x) = x” is log-convex 
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Since Re(x) > 0, 
we have 1 + Re(b;) < 0 


n-1 


Suppose y € Ci’), " and set y= = xs 
where A = eS. Re(y;)] "~ 71. Then 
Tl; Re(g;) = 1, and p'(y) = 
d”"~1p'(g). Hence y is a root of p’ if 
and only if y is. 


In our work towards a proof of Gurvits’s proposition, assume now that 
the polynomial p(z) € R4[r1,...,2,] is H-stable and homogeneous of 
degree n. We already know that p’(x1,...,%n—1) is homogeneous of 
degree n — 1. 


Lemma 1. For each x € C%, 
|p(x)| 2 |p(Re(x))]. 


We may assume that « € C'}, by continuity. Since p(a) is H-stable, we 
have p(Re(x)) # 0. Fix x and consider the set {p(x + sRe(z)) : s € C} 
as a function of s. As p(a) is homogeneous of degree n, we may write 


p(a + sRe(x)) = x)) | [(s-), 


i=l 


for some complex numbers ;,...,6,. Since p(x + bjRe(x)) = 0 for 

each i, we infer that x + b;Re(a) ¢ C44, which implies that 
Re(z + bjRe(z)) = Re(x)(1+Re(b;)) < 0, 

hence Re(b;) < —1, and thus |b;| > 1. 

It follows that 

|lp(z)| = [p(w +0-Re(z))| = |p(Re(x))| [] lol > Ip(Re(x))|, 
i=1 
as claimed. 


Lemma 2. Let y € Cy" and gies Re(y;) = 1. Then 


p(Re(y), t) 


cap(p) < 


for everyt > 0. 


For the proof set \ := t~* and @ := d(Re(y),t) € R%,. Then 


n n—-1 
[[% = a°({[ Rew) )é = 1, 
i=1 j=l 


and thus, using that p(a:) is homogeneous of degree n, 


p(Re(y), t) 


t 


cap(p) < p(Z) = A"p(Re(y),t) = 


@ Proof Gurvits’ Proposition. It now suffices to show (by scaling, see 
the margin) that for y € C'};* with J], ' Re(y;) = 1 the following two 
conditions hold: 


(D If p'(y) =9, then p’ = 


(I) ify € RY", then p'(y) > cap(p) - g(deg,, p). 
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Case 1. p(y, 0) = 0. 
We have p(Re(y), 0) = 0 by Lemma 1. Furthermore, 


Remember that 


t)—ply,0) _ |. ply t) 
My) = Yin POY 0) _) ie 2 Baas 
Pty) ‘NO t wot P(t) = “ee, an=0° 
and similarly 
. p(Re(y), t) 
IR, =" P( ) 
p (Re(y)) = lim ~—— 
Since p(Re(y),t) < |p(y,t)|, again by Lemma 1, we infer from Lemma 2 
that 
_ p(Re(y), t) , _ [ply, t)| ’ 
pS P(Re(y)) < lim ~~ Ip'(y)| 


If p'(y) = 0, then p’(Re(y)) = 0, and thus p’ = 0 since all coefficients 
of p’ are nonnegative. This proves (I) in this case, and (II) is trivially true 
since g(k) < 1 for all k. 


Case 2. p(y, t) has degree at most 1 as a polynomial in t. 


Since p(Re(y),t) < |p(y,t)| for all > 0 by Lemma 1, we conclude that 
p(Re(y), t) also has degree at most 1 in ¢. Thus 


’ <. ae Dy, t) ! 4: p(Re(y), t) 
P(y) = jim, p’(Re(y)) = jim ———, 
and Lemma 2 tells us that 
: p(Re(y), t) ! Ip(y, t)| ! 
< ees < = 
cap(p) < jim ~—, p(Re(y)) < Jim —~ IP I, 


and we infer (I) and (II) as before. 
Case 3. p(y,0) 4 0, and p(y, t) has degree at least 2 in t. 


This implies that k :-= deg,, p > 2, and we can write 


k 
p(y, t) = p(y, 0) [ [1 + ait) (4) 
i=l 
for some complex numbers a1,...,a@%. Hence 


k 
p(y) = ply,0) So ai 
i=1 
where not all a; are equal to 0, since p(y, t) has degree at least 2 in t. 


The following result is the heart of the proof. 


Claim. [f a; 4 0, then the inverse as" 


of the complex numbers y1,...,Yn—1- 


is a nonnegative linear combination 
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To see this we need the famous Lemma of Farkas from linear optimization. 
(See for example Schrijver [8, Sect. 7.3].) 


Farkas Lemma. Let A € R’** be a real matrix and b € R” a 
vector. Then exactly one of the following alternatives holds: 

(i) Ax = b, x € R*, x > O is solvable, 

(ii) ATz > 0, z € R’, b” z < 0 is solvable. 


For our problem take r = 2, s = n — 1, and such an a; ¥ 0, and set 
aS ( Re(yi) +++ Re(Yn—1) ) a ( Re(a; *) 
Im(y1) ++ Im(yn—1) }’ Im(a; *) 
Alternative (i) is exactly what we want: 
a =21y1 +++ +4n-1Yn-1, UE Re. (5) 
Assume the opposite: Let z = (5) be a solution and set \ = c—id € C. 
Then 


| 


(A?z); = Re(yj)Re(A) —Im(yj)Im(A) = Re(y;) > 0 
bi z=Re(a*A) < 0. 

This means that (Ay, —Aa;') lies in C%., . However, by (4) 

p(Ay, —Aaz") = A" p(y, -a;") = 0, 

contradicting the H-stability of p(a). This proves the claim. 

Since y € C1", we have Re(a; ') > 0 in (5) and thus Re(a;) > 0 for all 

nonzero a;. Looking at (4), we conclude that p’(y) 4 0, which proves (I). 

To see (II), pick y € Rt! with eae yj = 1. In this case all nonzero a; 


are positive reals by (5), thus pa a; > 0, and 2 — pum a; > 0. 


ply,0) 
Set t = -& 29 s 0. Using once more the AM-GM inequality we infer 
k—-1 p'(y) 
k k k 
p(y, t) 1 
- (1+ a,;t) < | (1 + a;t) 
vy.o) ~ H 2 
1 p(y) \]" L. |" k\* 
= ( 7 30y 0) pend a 
Lemma 2 applied to t = m4 ot therefore gives 
p(y, t k —1 ply,t) 
k—1y 4" p(y) 
<p! (y)—— -—-] = 
= PW) (— ak) 


or p'(y) > cap(p) - g(k), and the proof is complete. 
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“An H-stable crowd views C++” 


On a lemma 
of Littlewood and Offord 


In their work on the distribution of roots of algebraic equations, Littlewood 
and Offord proved in 1943 the following result: 


Let a1, @2,..-,@n, be complex numbers with \a;| > 1 for alli, and 
consider the 2” linear combinations an €;a; with ce; € {1,1}. 
Then the number of sums ee €;Q; which lie in the interior of any 
circle of radius | is not greater than 


nm 
c—= logn for some constant c > 0. 


Vn 


A few years later Paul Erdés improved this bound by removing the logn 
term, but what is more interesting, he showed that this is, in fact, a simple 
consequence of the theorem of Sperner (see page 213). 

To get a feeling for his argument, let us look at the case when all a; are 
real. We may assume that all a; are positive (by changing a; to —a,; and ¢€; 
to —e; whenever a; < 0). Now suppose that a set of combinations S> <;a; 
lies in the interior of an interval of length 2. Let N = {1,2,...,n} be the 
index set. For every }> ea; we set I = {i € N: e; = 1}. Nowif I GI’ 
for two such sets, then we conclude that 


Sv eias — > cia = 2 » ay = 2, 


i€l/\I 


which is a contradiction. Hence the sets J form an antichain, and we 
conclude from the theorem of Sperner that there are at most (iny2 ) such 
combinations. By Stirling’s formula (see page 13) we have 


n 2 
< c— f : 
ie < oo or some c > 0 


For n even and all a; = 1 we obtain (n/2) combinations 7", €;a; that 
sum to 0. Looking at the interval (—1,1) we thus find that the binomial 
number gives the exact bound. 

In the same paper Erdés conjectured that (iny2 ) was the right bound for 
complex numbers as well (he could only prove c2"n~!/? for some c) and 
indeed that the same bound is valid for vectors a1,...,@,, with |a;| > 1 in 
a real Hilbert space, when the circle of radius 1 is replaced by an open ball 
of radius 1. 
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Sperner’s theorem. Any antichain of 
subsets of an n-set has size at most 
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Erd6és was right, but it took twenty years until Gyula Katona and Daniel 
Kleitman independently came up with a proof for the complex numbers 
(or, what is the same, for the plane R?). Their proofs used explicitly the 
2-dimensionality of the plane, and it was not at all clear how they could be 
extended to cover finite dimensional real vector spaces. 

But then in 1970 Kleitman proved the full conjecture on Hilbert spaces 
with an argument of stunning simplicity. In fact, he proved even more. His 
argument is a prime example of what you can do when you find the right 
induction hypothesis. 

A word of comfort for all readers who are not familiar with the notion of 
a Hilbert space: We do not really need general Hilbert spaces. Since we 
only deal with finitely many vectors a;, it is enough to consider the real 
space IR? with the usual scalar product. Here is Kleitman’s result. 


Theorem. Let a,,...,a, be vectors in R%, each of length 
at least 1, and let Ri,...,Ry be k open regions of R%, where 
la — y| < 2 for any x, y that lie in the same region R;. 

Then the number of linear combinations Y>)_, €;ai, €; € {1, —1}, 
that can lie in the union |), R; of the regions is at most the sum of 
the k largest binomial coefficients (i: 

In particular, we get the bound (nyo) fork =1. 


Before turning to the proof note that the bound is exact for 


ay, vee Qn a (1, O,4<0,0)" 


Indeed, for even n we obtain (ny2) sums equal to 0, (,, io4) sums equal to 
(—2)a, (ajoua) sums equal to 2a, and so on. Choosing balls of radius 1 
around 


—2[%+]a, ... (-2)a, 0, 2a, ... 2[%*IJa, 


we obtain 


(xh) as (22) 2 (:) 7 (si2) re (ay 


sums lying in these k balls, and this is our promised expression, since the 
largest binomial coefficients are centered around the middle (see page 14). 
A similar reasoning works when n is odd. 


@ Proof. We may assume, without loss of generality, that the regions R; 
are disjoint, and will do so from now on. The key to the proof is the recur- 
sion of the binomial coefficients, which tells us how the largest binomial 
coefficients of n and n — 1 are related. Set r = [2-844], 5 = ["+4=1), 


then (""), (,."°,),--->(@) are the & largest binomial coefficients for n. The 
recursion ("?) = ome + (a) implies 
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Me 
= 
SS 

I 
Me. 
fo ~ 
3 
3. 
an 
—* 
| 
T 

Me. 
- 

-_ 
 M~_” 


SS (i= 1 — a= il 
-ECH)R ECT) 
2. fae l ee gee | 
= ECT) C7) 


and an easy calculation shows that the first sum adds the k + 1 largest 
binomial coefficients re), and the second sum the largest k — 1. 

Kleitman’s proof proceeds by induction on n, the case n = 1 being trivial. 
In the light of (1) we need only show for the induction step that the linear 


combinations of a1,...,@,, that lie in & disjoint regions can be mapped 
bijectively onto combinations of a1,...,@n—1 that lieink +1 o0rk—-—1 
regions. 


Claim. At least one of the translated regions R; — ay, is disjoint 
from all the translated regions Ry + an,..., Re + An. 


To prove this, consider the hyperplane H = {ax : (a,x) = c} orthogonal 
tO Gy, which contains all translates R; + a, on the side that is given by 
(a,x) > c, and which touches the closure of some region, say Rj +an. 
Such a hyperplane exists since the regions are bounded. Now |a — y| < 2 
holds for any x € Ff, and y in the closure of R;, since R; is open. We want 
to show that R; — a, lies on the other side of H. Suppose, on the contrary, 
that (a,,,@ —@,,) > cfor some x € Rj, that is, (@n, x) > |an|? +c. 

Let y + a, be a point where H touches R; + a,,, then y is in the closure 
of Rj, and (an, y + Gn) = ¢, that is, (an, —y) = |a,|? — c. Hence 


(an,2—y) > 2la,/, 
and we infer from the Cauchy—Schwarz inequality 
2lan|? < (an,@—y) < |an|la— yl, 


and thus (with |a,,| > 1) we get2 < 2la,| < |a—y 
The rest is easy. We classify the combinations }* e;a; which come to lie in 
R,U---U Rx as follows. Into Class | we put all pee €,a; withe, = —1 
and all ye €,a; with ¢,, = 1 lying in R;, and into Class 2 we throw 
in the remaining combinations $7", e;a; with «, = 1, notin R;. It 


, a contradiction. 


follows that the combinations pia €,a@, corresponding to Class 1 lie in 
the k + 1 disjoint regions Ry + an,..., Re + An and R; — ap, and the 
combinations wy €,a,; corresponding to Class 2 lie in the k — 1 disjoint 
regions Rj —a,,..., Ry—a, without R; —a,. By induction, Class | con- 
tains at most >? Ce) combinations, while Class 2 contains at most 
a5 aes combinations — and by (1) this is the whole proof, straight 


from The Book. 


182 


On a lemma of Littlewood and Offord 


References 


[1] P. ERDOs: On a lemma of Littlewood and Offord, Bulletin Amer. Math. Soc. 
51 (1945), 898-902. 


[2] G. KATONA: On a conjecture of Erdés and a stronger form of Sperner’s 
theorem, Studia Sci. Math. Hungar. 1 (1966), 59-63. 


[3] D. KLEITMAN: On a lemma of Littlewood and Offord on the distribution of 
certain sums, Math. Zeitschrift 90 (1965), 251-259. 


[4] D. KLEITMAN: On a lemma of Littlewood and Offord on the distributions of 
linear combinations of vectors, Advances Math. 5 (1970), 155-157. 


[5] J. E. LITTLEwooD & A. C. OFFORD: On the number of real roots of a 
random algebraic equation III, Mat. USSR Sb. 12 (1943), 277-285. 


Cotangent and the Herglotz trick 


What is the most interesting formula involving elementary functions? In 
his beautiful article [2], whose exposition we closely follow, Jiirgen Elstrodt 
nominates as a first candidate the partial fraction expansion of the cotangent 
function: 


1 1 
+ 
r+n TE 1 


Tceotnme = oe ) (x € R\Z). 


This elegant formula was proved by Euler in $178 of his Introductio in 
Analysin Infinitorum from 1748 and it certainly counts among his finest 
achievements. We can also write it even more elegantly as 


1 
neot7tz = lim * (1) 


but one has to note that the evaluation of the sum }°,,-7 —1_ is a bit 

. . ue THN, 
dangerous, since the sum is only conditionally convergent, so its value 
depends on the “right” order of summation. 


We shall derive (1) by an argument of stunning simplicity which is 
attributed to Gustav Herglotz — the “Herglotz trick.” To get started, set 


N 


f(x) := ncotra, g(x) = lim ba : 


3 
N-0o x n 
n=—N + 


and let us try to derive enough common properties of these functions to see 
in the end that they must coincide ... 


(A) The functions f and g are defined for all non-integral values and are 
continuous there. 


cos Tx 
sin 7x 


For the cotangent function f(a) = mcotta = 7 , this is clear (see 


the figure). For g(x), we first use the identity >, + = = #5 to 
rewrite Euler’s formula as 
; 1 3 Qa (2) 
TcotTme = —— —.— . 
x n? — x? 
n=1 
Thus for (A) we have to prove that for every x ¢ Z the series 
os 
2 _ 72 
<n — 2 


converges uniformly in a neighborhood of z. 
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Gustav Herglotz 


The function f(x) = 7 cot 7x 
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Addition theorems: 


sin(a + y) = sinxcosy + cosz siny 
cos(a + y) = cos xz cos y — sina sin y 


= sin(e+5)= cosx 


wT 


cos(# + 5) = —sing 


sin = 2sin 5 cos 5 


cos z = cos” 5 — sin 


2 


& 
2° 


For this, we don’t get any problem with the first term, for n = 1, or with 
the terms with 2n — 1 < «?, since there is only a finite number of them. On 
the other hand, for n > 2 and 2n—1 > x”, that is n? — a2? > (n— 1)? > 0, 
the summands are bounded by 


1 1 


0 
: n? — 2? s (n — 1)?’ 


and this bound is not only true for z itself, but also for values in a neighbor- 
hood of x. Finally the fact that }> Cae converges (to a see page 55) 
provides the uniform convergence needed for the proof of (A). 


(B) Both f and g are periodic of period 1, that is, f(a + 1) = f(a) and 
g(a + 1) = g(x) hold for all x € R\Z. 


Since the cotangent has period 7, we find that f has period 1 (see again the 
figure above). For g we argue as follows. Let 


= al 
eS + rar 
n=—N 
then 
N 1 N+1 1 
mer) = Yoho ¥ 
yer len Pn Te 


i. 1 
r+N > 2+N4+1 


In (2) 4 
Hence g(a + 1) = Nim Gy(@+1)= Nim In_1(@) = g(2). 


(C) Both f and g are odd functions, that is, we have f(—a) = — f(x) and 
g(—x) = —g(a) for all x € R\Z. 


The function f obviously has this property, and for g we just have to 
observe that g,,(—x) = —gy, (2). 


The final two facts constitute the Herglotz trick: First we show that f and g 
satisfy the same functional equation, and secondly that h := f — g can be 
continuously extended to all of R. 


(D) The two functions f and g satisfy the same functional equation: 
£(§) + FSF) = 2f(w) and 9(§) + 9(73*) = 29(2). 


For f(x) this results from the addition theorems for the sine and cosine 
functions: 


cos sin 
ra+sepy = [22-23 
2 2 
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The functional equation for g follows from 


2 
x xa+l 
9n (3) + In (73>) 29a (@) + Sen: 
which in turn follows from 
1 * i _ 2( 1 i 1 ) 
stn fin 7 e+2n w+2n+1/° 


Now let us look at 


h(a) = f(x) — g(@) = ncotra (- S- a 


2 2 
n x 
dL. 


: G3) 


n= 


We know by now that h is a continuous function on R\Z that satisfies the 
properties (B), (C), (D). What happens at the integral values? From the sine 


and cosine series expansions, or by applying de l’Hospital’s rule twice, we cose =1— 4+ 9 - & 
find A : x 2? at 
. 1 . xcosx — sing sng =2-Ft+h 7 
lim {cotx——) = lim —W——— = 0, ; ; 
«20 x z—0 xz sin & 
and hence also 1 
lim (= cot Tx — -) =: 
«2—0 x 
ie) 22x 


But since the last sum ‘ea n2_ 2 in (3) converges to 0 with x —+ 0, we 


have in fact lim h(a) = 0, and thus by periodicity 
2 


lim h(x) = 0 for all n € Z. 


msn 


In summary, we have shown the following: 


(E) By setting h(x) = 0 for x € Z, h becomes a continuous function 
on all of R that shares the properties given in (B), (C) and (D). 


We are ready for the coup de grace. Since h is a periodic continuous func- 
tion, it possesses a maximum m. Let xo be a point in [0, 1] with h(a) = m. 
It follows from (D) that 


h(22) + h(2#2) = 2m, 


and hence that h(=?) = m. Iteration gives h(5¢) = m for all n, and hence 
h(0) = m by continuity. But h(0) = 0, and so m = 0, that is, h(x) < 0 
for all « € R. As h(a) is an odd function, h(x) < 0 is impossible, hence 


h(a) = 0 for all 2 € R, and Euler’s theorem is proved. 


A great many corollaries can be derived from (1), the most famous of which 
concerns the values of Riemann’s zeta function at even positive integers 
(see the appendix to Chapter 9), 


co 


CQ) = (hE). 4) 


n=1 
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So to finish our story let us see how Euler — a few years later, in 1755 — 
treated the series (4). We start with formula (2). Multiplying (2) by x and 
setting y = 7x we find for |y| < 7: 


co y? 
ycoty = 1 Seay 


I 


eae, 
1-2 : 
fm? 1 (Ey 


The last factor is the sum of a geometric series, hence 


ycoty = Lye 
n=1k=1 
a ae ea 
= 1-29) (ae ae) 
k=1 n=1 


and we have proved the remarkable result: 


For allk €N, the coefficient of y?" in the power series expansion of y cot y 


equals : | 9 
[y"*] ycoty = qk » nek ap (2). (5) 


n=1 


There is another, perhaps much more “canonical,” way to obtain a series 
expansion of y cot y. We know from analysis that e’Y = cos y+7sin y, and 
thus 


ev +e e'¥ — ety 
cosy = a sny = yt 
which yields 
_ ev +e ety $1 
ycoty = ‘Yliy — e~y = WY tiy 1 7 


We now substitute z = 27y, and get 


ze*+1 z z 
ty = = ‘ 6 
a a os a a | °) 
Thus all we need is a power series expansion of the function —+,; note 
that this function is defined and continuous on all of R (for z = O use the 
power series of the exponential function, or alternatively de |’ Hospital’s 
rule, which yields the value 1). We write 


Zz git 
=! Ba. 7 
e* —1 3 n! 


n>0 


The coefficients B,, are known as the Bernoulli numbers. The left-hand 
side of (6) is an even function (that is, f(z) = f(—z)), and thus we see that 
By, = 0 for odd n > 3, while By = —4 corresponds to the term of a in (6). 
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From 


we obtain by comparing coefficients for 2”: 


ys By _ 1 forn=1, (8) 
= ki(n—k)! 0 forn Al. 
We may compute the Bernoulli a recursively from (8). a value 


n = 1 gives Bp = 1,n=2 yields 22 2+ By, = 0, that is By = and 
so on. 


-4, 


Now we are almost done: The combination of (6) and (7) yields 


ycoty 


=> Pe an 


and out comes, with (5), Euler’s formula for ¢(2k): 


Co 


il 
ee: = 


n=1 


(—1)*-1274-1 Bo, i: 
(2k)! 


(k EN). (9) 


Looking at our table of the Bernoulli numbers, we thus obtain once again 


the sum )> + i %@ from Chapter 9, and further 


ne 


=~ i nt a T | a 
d nt 90’ 2d n& 945’ 2 n& 9450 
ae i 3 1 6917"? 
a 710 Ta555" nl? ~~ 638512875’ 


The Bernoulli number By) = = 968 that gets us ¢(10) looks innocuous enough, 
but the next value Byy = — oy , needed for ¢(12), contains the large prime 
factor 691 in the numerator. Euler had first computed some values ¢ (2k) 
without noticing the connection to the Bernoulli numbers. Only the appear- 
ance of the strange prime 691 put him on the right track. 

Incidentally, since ¢(2k) converges to 1 for k —+ co, equation (9) tells us 
that the numbers |B ;| grow very fast — something that is not clear from 
the first few values. 

In contrast to all this, one knows very little about the values of the Riemann 
zeta function at the odd integers k > 3; see page 64. 


n |0 123 4 567 8 
14 1 1 1 
B,|1 -4 0-35 0 m9 


30 
The first few neeul numbers 


IN DEFINIEND. SUMMIS SERIER. INFINITY. 13) 
Tem. Quo autem valor harum fummarum clarius perfpicia- 
tur, plures hujufmodi Scricrum fummas commodiori modo 
expreflas hic adjiciam, 


2 3 4 s* 12304 

U I spray & ANNE cease 
Pte tp hottie ay ye 

res : : 2 p Se 
os + et ae He thee Taj—7 >” 

i 1 J ee 2 2 
cer ot Eo Aa sore = > x* 
Be ty pate ep wesc, ee 
urs y eS ” s&geese 3 

1 t 2M aks 2 69t 4s 
BE gus oe sre tare tigre Fee ro ‘ 

oh ie | 5 a a 
tho + Se + yee +&e.= 1.2.3..1f ©" 

1 1 1 oz a at? 3617_, 
It oat 7. + e + jb k= my i 6 
tan sat get tke 
Moe + tet Ae toe 

2 3 4 


Mon +50 nt on 
Ita + gts base + oh 


ru 
s1grsa0gfs wr 

273 
a4 


' 1 
thet 5 + 
76977927 ee 
1 


“ 


gy yg 


Hucufque iftos Poteftatum ipfius + Exponentes artificio alibi 
esponendo continuare licuit, quod ideo hic adjunxi, quod 
a Sesici 


Page 131 of Euler’s 1748 “Introductio in 
Analysin Infinitorum” 


188 


Cotangent and the Herglotz trick 
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Buffon’s needle problem 


A French nobleman, Georges Louis Leclerc, Comte de Buffon, posed the 
following problem in 1777: 


Suppose that you drop a short needle on ruled paper — what is then 
the probability that the needle comes to lie in a position where it 
crosses one of the lines? 


The probability depends on the distance d between the lines of the ruled 
paper, and it depends on the length ¢ of the needle that we drop — or 
rather it depends only on the ratio £ A short needle for our purpose is one 
of length 2 < d. In other words, a short needle is one that cannot cross 
two lines at the same time (and will come to touch two lines only with 
probability zero). The answer to Buffon’s problem may come as a surprise: 
It involves the number 7. 


Theorem (‘‘Buffon’s needle problem’’) 

If a short needle, of length £, is dropped on paper that is ruled with equally 
spaced lines of distance d > @, then the probability that the needle comes 
to lie in a position where it crosses one of the lines is exactly 


The result means that from an experiment one can get approximate val- 
ues for 7: If you drop a needle N times, and get a positive answer (an 
intersection) in P cases, then - should be approximately af that is, 7 
should be approximated by 2. The most extensive (and exhaustive) 
test was perhaps done by Lazzarini in 1901, who allegedly even built a 
machine in order to drop a stick 3408 times (with 4 = 3). He found 
that it came to cross a line 1808 times, which yields the approximation 
Te2- a ee = 3.1415929...., which is correct to six digits of 7, and 
much too good to be true! (The values that Lazzarini chose lead directly 
to the well-known approximation 7 ~ $53; see page 51. This explains the 
more than suspicious choices of 3408 and 2, where 2 3408 is a multiple 


of 355. See [5] for a discussion of Lazzarini’s hoax.) 

The needle problem can be solved by evaluating an integral. We will do that 
below, and by this method we will also solve the problem for a long needle. 
But the Book Proof, presented by E. Barbier in 1860, needs no integrals. 
It just drops a different needle ... 
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Buffon’s needle problem 


If you drop any needle, short or long, then the expected number of crossings 
will be 


E = pi +2p2+3p3+-::, 


where p, is the probability that the needle will come to lie with exactly one 
crossing, p2 is the probability that we get exactly two crossings, p3 is the 
probability for three crossings, etc. The probability that we get at least one 
crossing, which Buffon’s problem asks for, is thus 


p= Pit Pat ps +s 


(Events where the needle comes to lie exactly on a line, or with an end- 
point on one of the lines, have probability zero — so they can be ignored 
throughout our discussion.) 

On the other hand, if the needle is short then the probability of more than 
one crossing is zero, pg = p3 = --- = O, and thus we get H = p: The 
probability that we are looking for is just the expected number of crossings. 
This reformulation is extremely useful, because now we can use linearity 
of expectation (cf. page 116). Indeed, let us write E(¢) for the expected 
number of crossings that will be produced by dropping a straight needle of 
length ¢. If this length is 2 = x + y, and we consider the “front part” of 
length x and the “back part” of length y of the needle separately, then we 
get 


E(@+y) = E(x) + Ey), 


since the crossings produced are always just those produced by the front 
part, plus those of the back part. 

By induction on 7 this “functional equation” implies that E (naz) = nE(x) 
for all n € N, and then that mE(42) = E(m%a) = E(nx) = nE(a), 
so that E(rx) = rE(x) holds for all rational r € Q. Furthermore, E(x) 
is clearly monotone in z > 0, from which we get that E(x) = cx for all 
x > 0, where c = E(1) is some constant. 

But what is the constant? 


For that we use needles of different shape. Indeed, let’s drop a “polygonal” 
needle of total length @, which consists of straight pieces. Then the number 
of crossings it produces is (with probability 1) the sum of the numbers of 
crossings produced by its straight pieces. Hence, the expected number of 
crossings is again 


E = ef, 


by linearity of expectation. (For that it is not even important whether the 
straight pieces are joined together in a rigid or in a flexible way!) 

The key to Barbier’s solution of Buffon’s needle problem is to consider a 
needle that is a perfect circle C of diameter d, which has length « = dz. 
Such a needle, if dropped onto ruled paper, produces exactly two inter- 
sections, always! 


Buffon’s needle problem 


191 


The circle can be approximated by polygons. Just imagine that together 
with the circular needle C we are dropping an inscribed polygon P,, as 
well as a circumscribed polygon P”. Every line that intersects P,, will also 
intersect C, and if a line intersects C' then it also hits P”. Thus the expected 
numbers of intersections satisfy 


E(Pn) < E(C) < E(P"). 


Now both P,, and P” are polygons, so the number of crossings that we may 
expect is “c times length” for both of them, while for C' it is 2, whence 


chl(P,) < 2 < ce(P"). (1) 
Both P,, and P” approximate C’ for n —> oo. In particular, 
lim &(P,) = dr = lim &(P"), 
n—->co 


n—- Co 


and thus for » —> oo we infer from (1) that 


cdr < 2 <cdr, 


which gives c = 2 
Tv 


i 
re 
But we could also have done it by calculus! The trick to obtain an “easy” 
integral is to first consider the slope of the needle; let’s say it drops to lie 
with an angle of a away from horizontal, where a will be in the range 
O<a< a (We will ignore the case where the needle comes to lie with 
negative slope, since that case is symmetric to the case of positive slope, and 
produces the same probability.) A needle that lies with angle a has height 
ésin a, and the probability that such a needle crosses one of the horizontal 
lines of distance d is fina Thus we get the probability by averaging over 
the possible angles a, as 


n/2 
2 ésina 2€ x/2 2£ 
P= =| d da = —5[- cosa], = Pa 
0) 


For a long needle, we get the same probability 4 as “ as long as (sina < d, 


that is, in the range 0 < a < arcsin 4 However, for larger angles a the 
needle must cross a line, so the probability is 1. Hence we compute 


F m/2 
2 arcsin(d/) po: 
= “(/ fsing 1 > ; 1 da) 
T 0 d 
arcsin(d/¢) 
7 27h arcsin(d/£) T 7 d 
= 5 lala, + (5 - aren) 


- 1+2(F(1 1 7) arcsin ) 
for > d. 


So the answer isn’t that pretty for a longer needle, but it provides us with a 
nice exercise: Show (“just for safety”) that the formula yields = for @ = d, 
that it is strictly increasing in , and that it tends to 1 for 2 —> oo, 
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Buffon’s needle problem 


“Got a problem?” 
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“A melancholic Latin square” 


Pigeon-hole and double counting Chapter 28 


® 


Check for 
updates 


Some mathematical principles, such as the two in the title of this chapter, 
are so obvious that you might think they would only produce equally 
obvious results. To convince you that “It ain’t necessarily so” we 
illustrate them with examples that were suggested by Paul Erdos to be 
included in The Book. We will encounter instances of them also in later 
chapters. 


Pigeon-hole principle . ha 
If n objects are placed in r boxes, where r < n, then at least one of } 
the boxes contains more than one object. 


Well, this is indeed obvious, there is nothing to prove. In the language of 
mappings our principle reads as follows: Let N and R be two finite sets 
with 

|N|=n>r=|Rl, 


and let f : N —>+ R bea mapping. Then there exists some a € R with 
|f-1(a)| > 2. We may even state a stronger inequality: There exists some “The pigeon-holes from a bird’s 
a€ Rwith perspective” 


In t@| = [=]. (1) 


In fact, otherwise we would have |f~'(a)| < ® for all a, and hence 
n= >> |f-'(a)| <r =n, which cannot be. 
ac€R 


1. Numbers 
Claim. Consider the numbers 1,2,3,...,2n, and take anyn +1 
of them. Then there are two among these n + 1 numbers which are 
relatively prime. 


This is again obvious. There must be two numbers which are only | apart, 
and hence relatively prime. 


But let us now turn the condition around. 


Claim. Suppose again A C {1,2,...,2n} with |A| =n+1. Then 
there are always two numbers in A such that one divides the other. 


© Springer-Verlag GmbH Germany, part of Springer Nature 2018 
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Pigeon-hole and double counting 


Both results are no longer true if one 
replaces n+1 by n: For this consider 
the sets {2,4,6,...,2n}, respectively 
{n41,n+2,...,2n}. 


The reader may have fun in proving that 
for mn numbers the statement remains 
no longer true in general. 


This is not so clear. As Erdés told us, he put this question to young Lajos 
P6sa during dinner, and when the meal was over, Lajos had the answer. It 
has remained one of Erdés’ favorite “initiation” questions to mathematics. 
The (affirmative) solution is provided by the pigeon-hole principle. Write 
every number a € A in the form a = 2m, where m is an odd number 
between 1 and 2n — 1. Since there are n + 1 numbers in A, but only n 
different odd parts, there must be two numbers in A with the same odd 
part. Hence one is a multiple of the other. 


2. Sequences 


yoo 


Here is another one of Erdés’ favorites, contained in a paper of Erdés and 
Szekeres on Ramsey problems. 


Claim. Jn any sequence a1, 02,...,Amn+41 of mn+1 distinct real 
numbers, there exists an increasing subsequence 


Ai, < Aig St Sins (i. < tg <+++ <tm4i) 
of length m + 1, or a decreasing subsequence 

iggy Oey SO Sg (ja < ja <+++ < Jn41) 
of length n + 1, or both. 


This time the application of the pigeon-hole principle is not immediate. 
Associate to each a; the number ¢; which is the length of a longest increas- 
ing subsequence starting at a;. If t; > m+ 1 for some 2, then we have 
an increasing subsequence of length m + 1. Suppose then that ¢; < m for 


all i. The function f : a; +> t; mapping {a1,...,@mn+41} to {1,...,m} 
tells us by (1) that there is some s € {1,...,m} such that f(a;) = s for 
en + 1=n+1 numbers aj. Let a;,,0j.,---;@jn4, (ji < +++ < Jn41) 


be these numbers. Now look at two consecutive numbers a;,, aj,,,- If 
aj; < @j,,,, then we would obtain an increasing subsequence of length 
8 starting at a;,,,, and consequently an increasing subsequence of length 
s +1 starting at a;,, which cannot be since f(a,,) = s. We thus obtain a 
decreasing subsequence a;, > aj, >--- > @j,,, of length n + 1. 


This simple-sounding result on monotone subsequences has a highly nonob- 
vious consequence on the dimension of graphs. We don’t need here the 
notion of dimension for general graphs, but only for complete graphs K,,. 
It can be phrased in the following way. Let N = {1,...,n}, n > 3, and 
consider m permutations 771,..., 7», of NV. We say that the permutations 
1; represent K.,, if to every three distinct numbers 7, 7, /: there exists a per- 
mutation 7 in which k comes after both i and 7. The dimension of K,, is 
then the smallest m for which a representation 771, ... , 7, exists. 


As an example we have dim(/‘3) = 3 since any one of the three numbers 
must come last, as in 7; = (1,2,3), m2 = (2,3,1), 73 = (8,1,2). What 
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about AK4? Note first dim(K,) < dim(K,4+1): just delete n + 1 ina 

representation of K,,41. So, dim(/4) > 3, and, in fact, dim(/t4) = 3, by 

taking 

my, = (1,2,3,4), mo = (2,4,3,1), m3 = (1,4,3, 2). 

m:123 5 6 7 8 9101112 4 
m2:2 34 8 7 6 5121110 91 
m:3 411112 910 6 5 8 72 
:41210 91211 78 5 63 


It is not quite so easy to prove dim(K) = 4, but then, surprisingly, the 
dimension stays at 4 up ton = 12, while dim(A3) = 5. So dim(K,,) 
seems to be a pretty wild function. Well, it is not! With n going to infinity, 
dim(K.,,) is, in fact, a very well-behaved function — and the key for finding ™4 
a lower bound is the pigeon-hole principle. We claim These four permutations represent Ky2 


dim(K,) > log, logs n. (2) 


Since, as we have seen, dim(K,,) is a monotone function in n, it suffices to 
verify (2) forn = 2?” +4 1, that is, we have to show that 


dim(Kn) > p+1 for n=2” +1. 


Suppose, on the contrary, dim(K,,) < p, and let 7),..., 7, be representing 
permutations of N = {1,2,...,2?” +1}. Now we use our result on mono- 
tone subsequences p times. In 7, there exists a monotone subsequence A, 
of length | (it does not matter whether increasing or decreasing). 
Look at this set A, in 72. Using our result again, we find a monotone sub- 
sequence A» of A; in 72 of length ge 1, and Az is, of course, also 
monotone in 71. Continuing, we eventually find a subsequence A, of size 
22° 4+ 1 = 3 which is monotone in all permutations 7;. Let Ap = (a,b,c), 
then either a < b< cora>0>cinall7;. But this cannot be, since there 
must be a permutation where b comes after a and c. 


The right asymptotic growth was provided by Joel Spencer (upper bound) 
and by Fiiredi, Hajnal, Rod and Trotter (lower bound): 


d. 
dim(K,) = log, log,n+ (5 + 0(1)) logs logs log, n. 


But this is not the whole story: In 1999, Morris and Hosten found a method 


which, in principle, establishes the precise value of dim(K,,). Using their Ce eae Sa ee 
result and a computer one can obtain the values given in the margin. This dim(Kn) <5 <=> n<8l 

is truly astounding! Just consider how many permutations of size 1422564 dim(Kn) <6 <=> n< 2646 
there are. How does one decide whether 7 or 8 of them are required to dim(Kn) <7 =— > n< 1422564 


represent Ky422564? 


3. Sums 


Paul Erdés attributes the following nice application of the pigeon-hole 
principle to Andrew Vazsonyi and Marta Sved: 


Claim. Suppose we are given n integers a1,...,@n, which need 
not be distinct. Then there is always a set of consecutive numbers 
Ak+1; Ak+2;---,@¢ Whose sum paras a; is a multiple of n. 
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For the proof we set N = {0,1,...,n} and R = {0,1,...,n — 1}. Con- 
sider the map f : N — R, where f(m) is the remainder of a, +--+ + dm 
upon division by n. Since |N] =n+1> n= |RI, it follows that there are 
two sums a1 +--- + ax, @1 +++: +a¢ (k < £) with the same remainder, 
where the first sum may be the empty sum denoted by 0. It follows that 


a e k 
) Qa = ) ay — ) ay 
i=1 i=l 


i=k+1 


has remainder 0 — end of proof. 


Let us turn to the second principle: counting in two ways. By this we mean 
the following. 


Double counting 

Suppose that we are given two finite sets R and C and a subset 
S CRxC. Whenever (p,q) € S, then we say p and q are incident. 
If Tp denotes the number of elements that are incident to p € R, 
and Cq denotes the number of elements that are incident to q € C, 


then 
Woe = ES] Se (3) 


peER qeEC 


Again, there is nothing to prove. The first sum classifies the pairs in S 
according to the first entry, while the second sum classifies the same pairs 
according to the second entry. 

There is a useful way to picture the set S. Consider the matrix A = (apq), 
the incidence matrix of S, where the rows and columns of A are indexed 
by the elements of R and C, respectively, with 


“| 1 if(p,qges 
— 0 if (p,q) ¢S. 


With this set-up, 7, is the sum of the p-th row of A and Cq is the sum of the 
q-th column. Hence the first sum in (3) adds the entries of A (that is, counts 
the elements in S) by rows, and the second sum by columns. 

The following example should make this correspondence clear. Let R = 
C = {1,2,...,8}, and set S = {(i,7) : i divides 7}. We then obtain the 
matrix in the margin, which only displays the 1’s. 


4. Numbers again 


Look at the table on the left. The number of 1’s in column 7 is precisely the 
number of divisors of j; let us denote this number by t(j). Let us ask how 
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large this number t(j) is on the average when j ranges from 1 to n. Thus, 


we ask for the quantity 12345678 
n 


fn) = —S~ t(j). t(n)|1 53225 7 3 
= The first few values of t(n) 
How large is #(n) for arbitrary n? At first glance, this seems hopeless. For 

prime numbers p we have t(p) = 2, while for 2 we obtain a large number 

t(2") =k +1. So, t(n) is a wildly jumping function, and we surmise that 

the same is true for ¢(n). Wrong guess, the opposite is true! Counting in 

two ways provides an unexpected and simple answer. 

Consider the matrix A (as above) for the integers 1 up to n. Counting by 

columns we get eee t(j). How many 1’s are in row 7? Easy enough, the 

1’s correspond to the multiples of 7: 12,22,..., and the last multiple not 

exceeding n is | |i. Hence we obtain 


= tu = EE] ETE} 


j=l i=1 


where the error in each summand, when passing from 4 to a is less 


than 1. Now the last sum is the n-th harmonic number H,,, so we obtain 
Hy — 1 < t(n) < Hy, and together with the estimates of H,, on page 13 
this gives 


1 - 
logn-1 < H,-1-—— < t(n) < A, < logn+1. 
n 


Thus we have proved the remarkable result that, while t(7) is totally erratic, 
the average ¢(n) behaves beautifully: It differs from log n by less than 1. 


5. Graphs 


Let G be a finite simple graph with vertex set V and edge set F. We have 1 6 
defined in Chapter 13 the degree d(v) of a vertex v as the number of edges 

which have v as an end-vertex. In the example of the figure, the vertices 4 5 
1,2,...,7 have degrees 3, 2, 4,3, 3, 2, 3, respectively. 2 
Almost every book in graph theory starts with the following result (that we 
have already encountered in Chapters 13 and 20): 


SS dv) = 2\B|. (4) 


vEV 


For the proof consider S C V x E, where S is the set of pairs (v, e) such 
that v € V is an end-vertex of e € E. Counting S' in two ways gives on the 
one hand 5°,,<y d(v), since every vertex contributes d(v) to the count, and 
on the other hand 2|E]|, since every edge has two ends. 


As simple as the result (4) appears, it has many important consequences, 
some of which will be discussed as we go along. We want to single out in 
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this section the following beautiful application to an extremal problem on 
graphs. Here is the problem: 


Suppose G = (V,E) has n vertices and contains no cycle of 
length 4 (denoted by C4), that is, no subgraph LT: How many 
edges can G have at most? 


As an example, the graph in the margin on 5 vertices contains no 4-cycle 
and has 6 edges. The reader may easily show that on 5 vertices the maximal 
number of edges is 6, and that this graph is indeed the only graph on 5 
vertices with 6 edges that has no 4-cycle. 

Let us tackle the general problem. Let G be a graph on 7 vertices without 
a 4-cycle. As above we denote by d(u) the degree of u. Now we count 
the following set S in two ways: S is the set of pairs (u, {v,w}) where 
u is adjacent to v and to w, with v # w. In other words, we count all 


occurrences of u 
v UN w 
Summing over u, we find |S| = )o.cy (a), On the other hand, 


every pair {v, w} has at most one common neighbor (by the C4-condition). 
Hence |S| < (5), and we conclude 


x () s () 


So du)? < n(n—1) + S> d(w). (5) 


ueV uEeV 


or 


Next (and this is quite typical for this sort of extremal problems) we 
apply the Cauchy—Schwarz inequality to the vectors (d(u1),...,d(Un)) 
and (1,1,...,1), obtaining 


( » d(u)) <n = d(u)?, 


UueV ueV 


and hence by (5) 


( 3 d(u))- < n(n—1) +n >> du). 


uEeV uEeV 


Invoking (4) we find 
A|E|? < n?2(n—1)+4 2n|B| 


or 


Dn 
n \E| n*(n — 1) Z 
2 4 

Solving the corresponding quadratic equation we thus obtain the following 
result of Istvan Reiman. 


|E|? 0. 
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Theorem. /f the graph G on n vertices contains no 4-cycles, then 


|E| |; (1 + Vin—3)| (6) 


IA 


For n = 5 this gives |F| < 6, and the graph above shows that equality 
can hold. 

Counting in two ways has thus produced in an easy way an upper bound 
on the number of edges. But how good is the bound (6) in general? The 
following beautiful example [2] [3] [6] shows that it is almost sharp. As is 
often the case in such problems, finite geometry leads the way. 


In presenting the example we assume that the reader is familiar with the 
finite field Z,, of integers modulo a prime p (see page 20). Consider the 
3-dimensional vector space X over Z,. We construct from X the fol- 
lowing graph Gp. The vertices of G’, are the one-dimensional subspaces 
[v] == spanz {v}, 0 # v € X, and we connect two such subspaces 
|v] A [w] by an edge if 


(v,wW) = vw + vgW2 + v3wW3 = 0. 


Note that it does not matter which vector £ O we take from the subspace. 
In the language of geometry, the vertices are the points of the projective 
plane over Z,,, and [w] is adjacent to [v] if w lies on the polar line of v. 
As an example, the graph G2 has no 4-cycle and contains 9 edges, which 
almost reaches the bound 10 given by (6). We want to show that this is true 
for any prime p. 

Let us first prove that G, satisfies the C4-condition. If [wu] is a common 
neighbor of [v] and [w], then u is a solution of the linear equations 


vyx + vey + v3z =0 


Wye + Woy + w3z = 0. 


Since wv and w are linearly independent, we infer that the solution space 
has dimension 1, and hence that the common neighbor [w] is unique. 


Next, we ask how many vertices G,, has. It’s double counting again. The 
space X contains p* — 1 vectors 4 0. Since every one-dimensional sub- 
3 
space contains p — 1 vectors 4 0, we infer that X has = =pt+p+l 
one-dimensional subspaces, that is, G, has n = p + p+ 1 vertices. Simi- 
larly, any two-dimensional subspace contains p” — 1 vectors # 0, and hence 
pe-1l _ 
p-1 
It remains to determine the number of edges in G’,, or, what is the same by 
(4), the degrees. By the construction of G',, the vertices adjacent to [w] are 


the solutions of the equation 


p +1 one-dimensional subspaces. 


uy@ + uy + ugz = 0. (7) 


The solution space of (7) is a two-dimensional subspace, and hence there 
are p + 1 vertices adjacent to [u]. But beware, it may happen that u itself 
is a solution of (7). In this case there are only p vertices adjacent to [u]. 


(0, 0, 1) 


(1,0,0)  (4,1,0) (0, 1,0) 


The graph G2: its vertices are all seven 
nonzero triples (x, y, 2). 
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0 1 
1 0 
1 1 
A=]1 0 
0 1 
0 0 
0 0 


The matrix for 


DQDoeorcodrr 


2 


FPooOordTecoorF 


Ke OrFcCOrFO 


BPR co oH oo 


CSorPrRroOoOC;COo 


In summary, we obtain the following result: If w lies on the conic given by 
x2 +y? +27 =0, then d([u]) = p, and, if not, then d([u]) = p+ 1. Soit 
remains to find the number of one-dimensional subspaces on the conic 


ae? +y*?+27=0. 
Let us anticipate the result which we shall prove in a moment. 


Claim. There are precisely p” solutions (x,y, z) of the equation 


x+y? +27 = 0, and hence (excepting the zero solution) precisely 
p-1 


Dat = Pt 1 vertices in Gy of degree p. 


With this, we complete our analysis of G,. There are p + 1 vertices of 
degree p, hence (p? + p+ 1) — (p+ 1) = p? vertices of degree p + 1. 
Using (4), we obtain 


+1 2(p +1 +1) 
\E| = (p+ lp, pi(pt+1) _ (e+1)’p 
2 2 2 
izes 2 ———— 
= 2 7 P 44 (2p4+1)) = 2 , P (1+ /4p? + 4p + 1). 
Setting n = p? + p +1, the last equation reads 


—1 
|B] = (1+ Van=3), 


and we see that this almost agrees with (6). 


Now to the proof of the claim. The following argument is a beautiful appli- 
cation of linear algebra involving symmetric matrices and their eigenvalues. 
We will encounter the same method in Chapter 44, which is no coincidence: 
both proofs are from the same paper by Erdos, Rényi and Sos. 

We represent the one-dimensional subspaces of X as before by vectors 
U1, V2, .-+5 Up24+p+41, any two of which are linearly independent. Similarly, 
we may represent the two-dimensional subspaces by the same set of vec- 
tors, where the subspace corresponding to uw = (uz, U2, U3) is the set of so- 
lutions of the equation u;x+u2y+u3z = 0 as in (7). (Of course, this is just 
the duality principle of linear algebra.) Hence, by (7), a one-dimensional 
subspace, represented by v;, is contained in the two-dimensional subspace, 
represented by w,, if and only if (v;,v;) = 0. 

Consider now the matrix A = (a;,;) of size (p?-+p+1) x (p?+p+1), defined 
as follows: The rows and columns of A correspond to v1,..., Up24p+41 (we 
use the same numbering for rows and columns) with 


ae ee 1 if (U4, U5) = 0, 
yo 0 otherwise. 


A is thus a real symmetric matrix, and we have a;; = 1 if (v;, vi) = 0, that 
is, precisely when v; lies on the conic x? + y? + z? = 0. Thus, all that 
remains to show is that 

trace A = p+. 
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From linear algebra we know that the trace equals the sum of the eigenval- 
ues. And here comes the trick: While A looks complicated, the matrix A? 
is easy to analyze. We note two facts: 


e Any row of A contains precisely p+ 1 1’s. This implies that p+ 1 is an 
eigenvalue of A, since Al = (p+1)1, where 1 is the vector consisting 
of 1’s. 


e For any two distinct rows v;, v; there is exactly one column with a | in 
both rows (the column corresponding to the unique subspace spanned 
by v;, vj). 


Using these facts we find 


per i wm 4 
=| * ot = pl + J, 
1 ite sa p+l 


where J is the identity matrix and J is the all-ones-matrix. Now, J has 
the eigenvalue p? + p+ 1 (of multiplicity 1) and 0 (of multiplicity p? + p). 
Hence A? has the eigenvalues p? +2p+1 = (p+1)? of multiplicity 1 and p 
of multiplicity p? +p. Since A is real and symmetric, hence diagonalizable, 
we find that A has the eigenvalue p + 1 or —(p +1) and p? + p eigenvalues 
,/p. From Fact 1 above, the first eigenvalue must be p + 1. Suppose 
that ,/p has multiplicity r, and —,/p multiplicity s, then 


trace A = (p+1)+1r\/p— sy/p. 


But now we are home: Since the trace is an integer, we must have r = s, 
so trace A=p+l. 


6. Sperner’s Lemma 
In 1912, Luitzen Brouwer published his famous fixed point theorem: 


Every continuous function f: B” —+ B” of an n-dimensional ball 
to itself has a fixed point (a point x € B” with f(x) = x). 


For dimension 1, that is for an interval, this follows easily from the inter- 
mediate value theorem, but for higher dimensions Brouwer’s proof needed 
some sophisticated machinery. It was therefore quite a surprise when in 
1928 young Emanuel Sperner (he was 23 at the time) produced a simple 
combinatorial result from which both Brouwer’s fixed point theorem and 
the invariance of the dimension under continuous bijective maps could be 
deduced. And what’s more, Sperner’s ingenious lemma is matched by an 
equally beautiful proof — it is just double counting. 
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The tricolored triangles are shaded. 


We discuss Sperner’s lemma, and Brouwer’s theorem as a consequence, for 
the first interesting case, that of dimension n = 2. The energetic reader 
should find it not too difficult to extend the proofs to higher dimensions 
(by induction on the dimension). 


Sperner’s Lemma. 

Suppose that some “big” triangle with vertices Vi, V2, V3 is triangulated 
(that is, decomposed into a finite number of “small” triangles that fit to- 
gether edge-by-edge). 

Assume that the vertices in the triangulation get “colors” from the set 
{1, 2,3} such that V; receives the color i (for each i), and only the col- 
ors i. and j are used for vertices along the edge from V; to V; (fori # j), 
while the interior vertices are colored arbitrarily with 1, 2 or 3. 

Then in the triangulation there must be a small “tricolored” triangle, which 
has all three different vertex colors. 


@ Proof. We will prove a stronger statement: The number of tricolored 
triangles is not only nonzero, it is always odd. 

Consider the dual graph to the triangulation, but don’t take all its edges 
— only those which cross an edge that has endvertices with the (different) 
colors 1 and 2. Thus we get a “partial dual graph” which has degree 1 at all 
vertices that correspond to tricolored triangles, degree 2 for all triangles in 
which the two colors 1 and 2 appear, and degree 0 for triangles that do not 
have both colors 1 and 2. Thus only the tricolored triangles correspond to 
vertices of odd degree (of degree 1). 

However, the vertex of the dual graph which corresponds to the outside of 
the triangulation has odd degree: in fact, along the big edge from V, to V2, 
there is an odd number of changes between 1 and 2. Thus an odd number 
of edges of the partial dual graph crosses this big edge, while the other big 
edges cannot have both 1 and 2 occurring as colors. 

Now since the number of odd-degree vertices in any finite graph is even (by 
equation (4)), we find that the number of small triangles with three different 
colors (corresponding to odd inside vertices of our dual graph) is odd. 


With this lemma, it is easy to derive Brouwer’s theorem. 


Proof of Brouwer’s fixed point theorem (for n = 2). Let A be the tri- 
angle in R® with vertices e; = (1,0,0), e2 = (0,1,0), and e3 = (0,0, 1). 
It suffices to prove that every continuous map f: A —> A has a fixed point, 
since A is homeomorphic to the two-dimensional ball Bo. 

We use 0(7 ) to denote the maximal length of an edge in a triangulation T. 
One can easily construct an infinite sequence of triangulations 7), 72,... 
of A such that the sequence of maximal diameters 5(7;,) converges to 0. 
Such a sequence can be obtained by explicit construction, or inductively, 
for example by taking 7;,,1 to be the barycentric subdivision of Tx. 

For each of these triangulations, we define a 3-coloring of their vertices v 
by setting \(v) = min{z: f(v); < v;}, that is, A(v) is the smallest index 7 
such that the i-th coordinate of f(v) — v is negative. If this smallest index i 
does not exist, then we have found a fixed point and are done: To see this, 
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note that every v € A lies in the plane 71 + 72 +23 = 1, hence ye “= 1, 
So if f(v) # v, then at least one of the coordinates of f(v) — v must be 
negative (and at least one must be positive). 


Let us check that this coloring satisfies the assumptions of Sperner’s lemma. 
First, the vertex e; must receive color 2, since the only possible negative 
component of f(e;) — e; is the i-th component. Moreover, if v lies on the 
edge opposite to e;, then uv; = 0, so the i-th component of f(v) — v cannot 
be negative, and hence v does not get the color 7. 


Spemer’s lemma now tells us that in each triangulation 7; there is a tri- 
colored triangle {v*'!, v*:?, v*3} with A\(v**) = i. The sequence of 
points (oF) p>1 need not converge, but since the simplex A is compact 
some subsequence has a limit point. After replacing the sequence of tri- 
angulations 7; by the corresponding subsequence (which for simplicity 
we also denote by 7;,) we can assume that (v*'!), converges to a point 
v € A. Now the distance of v**? and v**3 from v**! is at most the mesh 
length 6(7;,), which converges to 0. Thus the sequences (v*:?) and (v*'3) 
converge to the same point v. 

But where is f(v)? We know that the first coordinate f(v*'!) is smaller 
than that of v**! for all k. Now since f is continuous, we derive that the 
first coordinate of f(v) is smaller or equal to that of v. The same reasoning 
works for the second and third coordinates. Thus none of the coordinates 
of f(v) — v is positive — and we have already seen that this contradicts 
the assumption f(v) 4 v. 
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Tiling rectangles 


Some mathematical theorems exhibit a special feature: The statement of 
the theorem is elementary and easy, but to prove it can turn out to be a tan- 
talizing task — unless you open some magic door and everything becomes 
clear and simple. 


One such example is the following result due to Nicolaas de Bruijn: 


Theorem. Whenever a rectangle is tiled by rectangles all of which 
have at least one side of integer length, then the tiled rectangle has 
at least one side of integer length. 


By a tiling we mean a covering of the big rectangle R with rectangles 
T,,..., Tm that have pairwise disjoint interior, as in the picture to the right. 
Actually, de Bruijn proved the following result about packing copies of an 
a x brectangle into ac x d rectangle: If a, b, c, d are integers, then each of 
a and b must divide one of c or d. This is implied by two applications of 
the more general theorem above to the given figure, scaled down first by a 
factor of 4, and then scaled down by a factor of i. Each small rectangle 
has then one side equal to 1, and so © or a must be an integer. 

Almost everybody’s first attempt is to try induction on the number of small 
rectangles. Induction can be made to work, but it has to be performed 
very carefully, and it is not the most elegant option one can come up with. 
Indeed, in a delightful paper Stan Wagon surveys no less than fourteen 
different proofs out of which we have selected three; none of them needs 
induction. The first proof, essentially due to de Bruijn himself, makes use 
of a very clever calculus trick. The second proof by Richard Rochberg 
and Sherman Stein is a discrete version of the first proof, which makes it 
simpler still. But the champion may be the third proof suggested by Mike 
Paterson. It is just counting in two ways and almost one-line. 

In the following we assume that the big rectangle F is placed parallel to the 
x, y-axes with (0,0) as the lower left-hand corner. The small rectangles T; 
have then sides parallel to the axes as well. 


@ First Proof. Let T be any rectangle in the plane, where T' extends from 


a to b along the x-axis and from c to d along the y-axis. Here is de Bruijn’s 
trick. Consider the double integral over 7’, 


d pb 
ff one deay. (1) 
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Check for 
updates 


The big rectangle has side lengths 11 
and 8.5. 
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[feo = > // f(a,y) 
R ae i 


Additivity of the integral 


an 


The amount of black in the corner 


rectangle is min(x, 5) - min(y, 3) + 
max(a — 5,0)-max(y— 4,0), and this 
is always greater than sry. 


Since 


I 


d pb : b : d : 
/ f e2TU@+y) da dy / eet 2dr. / E27 dy, 


it follows that the integral (1) is 0 if and only if at least one of ih eae 
or le e27'¥ dy is equal to 0. 
We are going to show that 


b 
| ede =-0 <> b—a isaninteger. (2) 


But then we will be done! Indeed, by the assumption on the tiling, each 
JJ, is equal to 0, and so by additivity of the integral, {/,, = 0 as well, 
whence / has an integer side. 


It remains to verify (2). From 


: 2ria 1 2rix b 1 27d 2ria 
e"*dy = =e = —(e*""” — @*7"*) 
4 271 a 2r 
E27 


271 


we conclude that 
b . * 
/ ec * da —0 . e2ti(b—a) = i. 


From e2'* = cos 27x + isin 27x we see that the last equation is, in turn, 
equivalent to 


cos 27(b— a) =1 and sin27(b— a) = 0. 


Since cos x = | holds if and only if x is an integer multiple of 277, we must 
have b — a € Z, and this also implies sin 27(b — a) = 0. 


@ Second Proof. Color the plane in a checkerboard fashion with black/ 


white squares of size $ x $, starting with a black square at (0, 0). 


By the assumption on the tiling every small rectangle T; must receive an 
equal amount of black and white, and therefore the big rectangle R too 
contains the same amount of black and white. 

But this implies that R must have an integer side, since otherwise it can 
be split into four pieces, three of which have equal amounts of black and 
white, while the piece in the upper right-hand corner does not. Indeed, if 
x =a-—|a|,y =b-— |b], so that 0 < a, y < 1, then the amount of black 
is always greater than that of white. 


This is illustrated in the figure in the margin. 


Tiling rectangles 


209 


@ Third proof. Let C' be the set of corners in the tiling for which both 
coordinates are integral (so, for example, (0,0) € C), and let T be the 
set of tiles. Form a bipartite graph G' on the vertex set C' U T by joining 
each corner c € C to all the tiles of which it is a corner. The hypothesis 
implies that each tile is joined to 0, 2, or 4 corners in C’, since if one corner 
is in C’, then so is also the other end of any integer side. Now look at C. 
Any c € C' which is not a corner of R is joined to an even number of tiles, 
but the vertex (0, 0) is joined to only one tile. As the number of odd-degree 
vertices in any finite graph is even (as we have just observed on page 204), 
there must be another c € C' of odd degree, and c can only be one of the 
other vertices of R — end of proof. 


All three proofs can quite easily be adapted to also yield an n-dimensional 
version of de Bruijn’s result: Whenever an n-dimensional box R is tiled by 
boxes all of which have at least one integer side, then / has an integer side. 


However, we want to keep our discussion in the plane (for this chapter), 
and look at a “companion result” to de Bruijn’s, due to Max Dehn (many 
years earlier), which sounds quite similar, but asks for different ideas. 


Theorem. A rectangle can be tiled with squares if and only if the 
ratio of its side lengths is a rational number. 


One half of the theorem is immediate. Suppose the rectangle R has side 
lengths a and 8 with a € Q, that is, a = a with p,q € N. Setting 
$= - = 7 we can easily tile R with copies of the s x s square as shown 
in the margin. 

For the proof of the converse Max Dehn used an elegant argument that he 
had already successfully employed in his solution of Hilbert’s third problem 
(see Chapter 10). In fact, the two papers appeared in successive years in the 
Mathematische Annalen. 


@ Proof. Suppose RF is tiled by squares of possibly different sizes. By 
scaling we may assume that R is an a x 1 rectangle. Let us assume a ¢ Q 
and derive a contradiction from this. The first step is to extend the sides of 
the squares to the full width resp. height of F as in the figure. 


Ris now decomposed into a number of small rectangles; let a1, a2,...,an 
be their side lengths (in any order), and consider the set 


A:= {1,a,a1,...,au}CR. 


24 


. ‘ 


Here the bipartite graph G is drawn with 


vertices in C’ white, vertices in T’ black, 


and dashed edges. 


Ss p squares 


q Squares 
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Linear extension: 


(abi apa le dmbm) = 
for qi,..-. 


Im EQ. 


Next comes a linear algebra part. We define V (A) as the vector space of 
all linear combinations of the numbers in A with rational coefficients. Note 
that V(A) contains all side lengths of the squares in the original tiling, 
since any such side length is the sum of some a,’s. As the number a is not 
rational, we may extend {1, a} to a basis B of V(A), 


B= {by = 1, be = 0). 045 .204.5 07m | 
Define the function f : B — R by 
fQ)=1, f(a):=—-1, and f(b;):=0 for 7 > 3, 


and extend it linearly to V(A). 


The following definition of “area” of rectangles finishes the proof in three 
quick steps: For c,d € V(A) the area of the c x d rectangle is defined as 


area([__]d) = f(e)f(d). 
(1) area (LT) = area([__d) + area( [_]d). 


C2 
This follows immediately from the linearity of f. The analogous result 
holds, of course, for vertical strips. 


(2) area(R) = 5> area((_]), where the sum runs through the squares in 


squares 
the tiling. 
Just note that by (1) area(R) equals the sum of the areas of all small 
rectangles in the extended tiling. Since any such rectangle is in exactly 
one square of the original tiling, we see (again by (1)) that this sum is 
also equal to the right-hand side of (2). 


(3) We have 


wm 


area() = f(a) fC) = -1, 
whereas for a square of side length t, area([_]) = f(t)? > 0, and so 


t 
> area([_]) > 0, 


Squares 


and this is our desired contradiction. 


For those who want to go for further excursions into the world of tilings the 
beautiful survey paper [1] by Federico Ardila and Richard Stanley is highly 
recommended. 


Tiling rectangles 


References 


[1] F. ARDILA & R. P. STANLEY: Tilings, Math. Intelligencer (4)32 (2010), 32- 
43. 


[2] N. G. DE BRUIJIN: Filling boxes with bricks, Amer. Math. Monthly 76 (1969), 
37-40. 


[3] M. DEHN: Uber die Zerlegung von Rechtecken in Rechtecke, Mathematische 
Annalen 57 (1903), 314-332. 


[4] S. WAGON: Fourteen proofs of a result about tiling a rectangle, Amer. Math. 
Monthly 94 (1987), 601-617. 


‘- \ 
Zh P 
p> B\ 
he ( fe) J 


Lago 
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Don’t hit the integers!” 


Three famous theorems Chapter 30 


on finite sets ® 


In this chapter we are concerned with a basic theme of combinatorics: 
properties and sizes of special families F of subsets of a finite set N = 
{1,2,...,m}. We start with two results which are classics in the field: the 
theorems of Sperner and of Erdés—Ko—Rado. These two results have in 
common that they were reproved many times and that each of them initi- 
ated a new field of combinatorial set theory. For both theorems, induction 
seems to be the natural method, but the arguments we are going to discuss 
are quite different and truly inspired. 


In 1928 Emanuel Sperner asked and answered the following question: Sup- 
pose we are given the set NV = {1,2,...,n}. Call a family F of subsets of 
N an antichain if no set of F contains another set of the family F. What is 
the size of a largest antichain? Clearly, the family F;, of all k-sets satisfies 
the antichain property with |F,| = (os Looking at the maximum of the 
binomial coefficients (see page 14) we conclude that there is an antichain 
of size (ms ) = max, (ee Sperner’s theorem now asserts that there are 
no larger ones. 


Theorem 1. The size of a largest antichain of an n-set is (nya): 


Emanuel Sperner 


@ Proof. Of the many proofs the following one, due to David Lubell, is 
probably the shortest and most elegant. Let F be an arbitrary antichain. 
Then we have to show |F| < (),",5,). The key to the proof is that we 


[n/2] 
consider chains of subsets @ = Cp C Cy C Cp C--- C C, = N, where 
|C;| = 7 for 2 = 0,...,n. How many chains are there? Clearly, we obtain 


a chain by adding one by one the elements of N, so there are just as many 
chains as there are permutations of NV, namely n!. Next, fora set A € F 
we ask how many of these chains contain A. Again this is easy. To get 
from @ to A we have to add the elements of A one by one, and then to pass 
from A to N we have to add the remaining elements. Thus if A contains k 
elements, then by considering all these pairs of chains linked together we 
see that there are precisely k!(n — k)! such chains. Note that no chain can 
pass through two different sets A and B of F, since F is an antichain. 

To complete the proof, let m,, be the number of k-sets in F. Thus |F| = 
yy Mk. Then it follows from our discussion that the number of chains 
passing through some member of F is 


n 
eS mr ki (n— k)!, 
k=0 
and this expression cannot exceed the number n! of all chains. Hence 
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Check that the family of all }-sets for 


even n respectively the two families of 


all n= sets and of all ntl sets, when 


n is odd, are indeed the only antichains 


that achieve the maximum size! 


A circle C for n = 6. The bold edges 
depict an arc of length 3. 


we conclude 


= k(n —k)! “mk 
nge—_ —_|_ < 1, or Tay < 1. 
mea 2 0) 


Replacing the denominators by the largest binomial coefficient, we there- 
fore obtain 


1 


yim < (wai) 


ax om, < 1,  thatis, [Fl = 
(in/2)) Ko 


and the proof is complete. 


Our second result is of an entirely different nature. Again we consider the 
set N = {1,...,n}. Calla family F of subsets an intersecting family if any 
two sets in F have at least one element in common. It is almost immediate 
that the size of a largest intersecting family is 2”~!. If A ¢ F, then the 
complement AC = N\A has empty intersection with A and accordingly 
cannot be in F. Hence we conclude that an intersecting family contains at 
most half the number 2” of all subsets, that is, |F| < 2”-1_ On the other 
hand, if we consider the family of all sets containing a fixed element, say 
the family F, of all sets containing 1, then clearly |F,| = 2”~', and the 
problem is settled. 


But now let us ask the following question: How large can an intersecting 
family ¥ be if all sets in F have the same size, say k ? Let us call such fami- 
lies intersecting k-families. To avoid trivialities, we assume n > 2k since 
otherwise any two k-sets intersect, and there is nothing to prove. Taking 
up the above idea, we certainly obtain such a family 7, by considering all 
k-sets containing a fixed element, say 1. Clearly, we obtain all sets in Fy; 
by adding to | all (k — 1)-subsets of {2,3,...,n}, hence |Fi| = (771). 
Can we do better? No — and this is the theorem of Erdés—Ko-Rado. 


‘ ‘ i ead - (n—1 
Theorem 2. The largest size of an intersecting k-family in an n-set is ( ee a) 
when n > 2k. 


Paul Erdés, Chao Ko and Richard Rado found this result in 1938, but it 
was not published until 23 years later. Since then multitudes of proofs and 
variants have been given, but the following argument due to Gyula Katona 
is particularly elegant. 


@ Proof. The key to the proof is the following simple lemma, which at 
first sight seems to be totally unrelated to our problem. Consider a circle C’ 
divided by n points into n edges. Let an arc of length k consist of k + 1 
consecutive points and the k edges between them. 


Lemma. Let n > 2k, and suppose we are given t distinct arcs A,,..., At 
of length k, such that any two arcs have an edge in common. Thent < k. 


To prove the lemma, note first that any point of C is the endpoint of at most 
one arc. Indeed, if A;, A; had a common endpoint v, then they would have 
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to start in different direction (since they are distinct). But then they cannot 
have an edge in common as n > 2k. Let us fix A,. Since any A; (7 > 2) 
has an edge in common with A;, one of the endpoints of A; is an inner 
point of A;. Since these endpoints must be distinct as we have just seen, 
and since A, contains k — 1 inner points, we conclude that there can be at 
most k — 1 further arcs, and thus at most k arcs altogether. 


Now we proceed with the proof of the Erd6s—Ko—Rado theorem. Let F be 
an intersecting k-family. Consider a circle C with n points and n edges as 
above. We take any cyclic permutation 7 = (a1, a2,...,@,,) and write the 
numbers a; clockwise next to the edges of C’. Let us count the number of 
sets A € F which appear as k consecutive numbers on C’. Since ¥ is an 
intersecting family we see by our lemma that we get at most k such sets. 
Since this holds for any cyclic permutation, and since there are (n — 1)! 
cyclic permutations, we produce in this way at most 


k(n — 1)! 


sets of F which appear as consecutive elements of some cyclic permutation. 
How often do we count a fixed set A € ¥? Easy enough: A appears in 7 
if the & elements of A appear consecutively in some order. Hence we have 
k! possibilities to write A consecutively, and (n — k)! ways to order the 
remaining elements. So we conclude that a fixed set A appears in precisely 
k!(n — k)! cyclic permutations, and hence that 


k(n—1)! _ (n —1)! _ {n=l 
Fl = mom! > oDim-1-(@=))! > (as! 


Again we may ask whether the families containing a fixed element are the 
only intersecting k-families of maximal size. This is certainly not true for 
n = 2k. For example, for n = 4 and k = 2 the family {1, 2}, {1, 3}, {2,3} 
also has size (?) = 3. More generally, form = 2k we get the largest 
intersecting k-families, of size $ (7) = a) , by arbitrarily including one 
out of every pair of sets formed by a k-set A and its complement N\A. But 
for n > 2k the special families containing a fixed element are indeed the 


only ones. The reader is invited to try his hand at the proof. 


Finally, we turn to the third result which is arguably the most important 
basic theorem in finite set theory, the “marriage theorem” of Philip Hall 
proved in 1935. It opened the door to what is today called matching theory, 
with a wide variety of applications, some of which we shall see as we 
go along. 


Consider a finite set X and acollection A;,...,A,, of subsets of X (which 
need not be distinct). Let us call a sequence 71,...,%», a system of distinct 
representatives of {A1,..., An} if the x; are distinct elements of X, and 


if x; € A; for all 7. Of course, such a system, abbreviated SDR, need not 
exist, for example when one of the sets A; is empty. The content of the 
theorem of Hall is the precise condition under which an SDR exists. 


An intersecting family forn = 4,k = 2 
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“A mass wedding” 


{B,C, D} is a critical family 


Before giving the result let us state the human interpretation which gave it 
the folklore name marriage theorem: Consider a set {1,...,} of girls and 
a set X of boys. Whenever x € Aj, then girl 7 and boy z are inclined to 
get married, thus A; is just the set of possible matches of girl i. An SDR 
represents then a mass-wedding where every girl marries a boy she likes. 


Back to sets, here is the statement of the result. 


Theorem 3. Let Ai,..., A, be a collection of subsets of a finite set X. 
Then there exists a system of distinct representatives if and only if the union 
of any m sets A; contains at least m elements, for 1 <m <n. 


The condition is clearly necessary: If m sets A; contain between them 
fewer than m elements, then these m sets can certainly not be represented 
by distinct elements. The surprising fact (resulting in the universal ap- 
plicability) is that this obvious condition is also sufficient. Hall’s original 
proof was rather complicated, and subsequently many different proofs were 
given, of which the following one (due to Easterfield and rediscovered by 
Halmos and Vaughan) may be the most natural. 


@ Proof. We use induction on n. For n = 1 there is nothing to prove. Let 
n > 1, and suppose {A,,...,A,,} satisfies the condition of the theorem 
which we abbreviate by (H). Call a collection of @ sets A; with <2<na 
critical family if its union has cardinality ¢. Now we distinguish two cases. 


Case 1: There is no critical family. 


Choose any element x € A,,. Delete x from X and consider the collection 
4,+--;Ai,_, with Ai = A;\{a}. Since there is no critical family, we find 
that the union of any m sets A’, contains at least m elements. Hence by 
induction on n there exists an SDR 21,...,%p,_1 of {Aj,..., A’,_,}, and 
together with z,, = x, this gives an SDR for the original collection. 


Case 2: There exists a critical family. 


After renumbering the sets we may assume that {Aj,..., Ac} is a critical 
family. Then 4 A; = X with |X| = &. Since ¢ < n, we infer the exis- 
tence of an SDR for Aj,,...,Ag by induction, that is, there is a numbering 
11,...,v¢ of X such that x; € A; for alli < 2. 


Consider now the remaining collection Az,1,..., An, and take any m of 
these sets. Since the union of A;,..., Ag and these m sets contains at least 
€ + m elements by condition (H), we infer that the m sets contain at least 
m elements outside X. In other words, condition (H) is satisfied for the 
family 


Aga 0X, sue Age. 
Induction now gives an SDR for Ag41,..., An that avoids X. Combin- 


ing it with 71,...,2¢ we obtain an SDR for all sets A;. This completes 
the proof. 
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As we mentioned, Hall’s theorem was the beginning of the now vast field 
of matching theory [6]. Of the many variants and ramifications let us state 
one particularly appealing result which the reader is invited to prove for 
himself: 


Suppose the sets Ai,...,An all have size k > 1 and suppose 
further that no element is contained in more than k sets. Then 
there exist k SDR’s such that for any i the k representatives of A; 
are distinct and thus together form the set Aj. 


A beautiful result which should open new horizons on matriage possi- 
bilities. 
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Shuffling cards 


How often does one have to shuffle a deck of cards until it is random? 


The analysis of random processes is a familiar duty in life (“How long does 
it take to get to the airport during rush-hour?”) as well as in mathematics. 
Of course, getting meaningful answers to such problems heavily depends 
on formulating meaningful questions. For the card shuffling problem, this 
means that we have 


e to specify the size of the deck (n = 52 cards, say), 


e to say how we shuffle (we'll analyze top-in-at-random shuffles first, 
and then the more realistic and effective riffle shuffles), and finally 


e to explain what we mean by “is random” or “is close to random.” 


So our goal in this chapter is an analysis of the riffle shuffle, due to Edgar 
N. Gilbert and Claude Shannon (1955, unpublished) and Jim Reeds (1981, 
unpublished), following the statistician David Aldous and the former ma- 
gician turned mathematician Persi Diaconis according to [1]. We will not 
reach the final precise result that 7 riffle shuffles are sufficient to get a deck 
of n = 52 cards very close to random, while 6 riffle shuffles do not suf- 
fice — but we will obtain an upper bound of 12, and we will see some 
extremely beautiful ideas on the way: the concepts of stopping rules and 
of “strong uniform time,” the lemma that strong uniform time bounds the 
variation distance, Reeds’ inversion lemma, and thus the interpretation of 
shuffling as “reversed sorting.” In the end, everything will be reduced to 
two very classical combinatorial problems, namely the coupon collector 
and the birthday paradox. So let’s start with these! 


The birthday paradox 


Take n random people — the participants of a class or seminar, say. What 
is the probability that they all have different birthdays? With the usual 
simplifying assumptions (365 days a year, no seasonal effects, no twins 
present) the probability is 


© Springer-Verlag GmbH Germany, part of Springer Nature 2018 


Chapter 31 


Check for 
updates 


Persi Diaconis’ business card as a magi- 


cian. In a later interview he said: “If you 
say that you are a professor at Stanford 
people treat you respectfully. If you say 
that you invent magic tricks, they don’t 
want to introduce you to their daughter.” 
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s>1 s>1 
= y x(s+1) —- u's 
s>0 s>0 
— ge rome 1 
», 1-2’ 
s>0 


where at the end we sum a geometric 
series (see page 48). 


which is smaller than 4 for n = 23 (this is the “birthday paradox’’!), less 
than 9 percent for n = 42, and exactly 0 for n > 365 (the “pigeon-hole 
principle,’ see Chapter 28). The formula is easy to see — if we take the 
persons in some fixed order: If the first 7 persons have distinct birthdays, 
then the probability that the (i + 1)-st person doesn’t spoil the series is 


1- xe since there are 365 — 2 birthdays left. 


Similarly, if m balls are placed independently and randomly into i boxes, 
then the probability that no box gets more than one ball is 


n—-1 


p(n, Kk) = II (i- i 


The coupon collector 


Children buy photos of pop stars (or soccer stars) for their albums, but 
they buy them in little nontransparent envelopes, so they don’t know which 
photo they will get. If there are n different photos, what is the expected 
number of pictures a kid has to buy until he or she gets every motif at 
least once? 


Equivalently, if you randomly take balls from a bowl that contains n dis- 
tinguishable balls, and if you put your ball back each time, and then again 
mix well, how often do you have to draw on average until you have drawn 
each ball at least once? 


If you already have drawn k distinct balls, then the probability not to get 
a new one in the next drawing is £. So the probability to need exactly s 


n* 


drawings for the next new ball is (£)’~1(1 — £). And thus the expected 


number of drawings for the next new ball is 
k\s-l k; 1 
EOD 
a n n 1— 7 


as we get from the series in the margin. So the expected number of drawings 
until we have drawn each of the n different balls at least once is 


n 
—= = a ean ie oun = nH, & nilogn, 
n 


with the bounds on the size of harmonic numbers that we had obtained on 
page 13. So the answer to the coupon collector’s problem is that we have 
to expect that roughly n log n drawings are necessary. 

The estimate that we need in the following is for the probability that you 
need significantly more than nlogn trials. If V,, denotes the number of 
drawings needed (this is the random variable whose expected value is 
E\V,] * nlogn), then for n > 1 and c > 0, the probability that we 
need more than m := [nlogn + cn] drawings is 


Prob[V;, > ml] < ee, 
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Indeed, if A; denotes the event that the ball 7 is not drawn in the first m 
drawings, then 


Prob[V,, > m| = Prob| {J Ai < > Prob [Ai] 


l| 


n(1 - ~) is epee <= e=*, 
n 
Now let’s grab a deck of m cards. We number them | up to n in the or- 
der in which they come — so the card numbered “1” is at the top of the 
deck, while “n” is at the bottom. Let us denote from now on by G,, the 
set of all permutations of 1,...,. Shuffling the deck amounts to the ap- 
plication of certain random permutations to the order of the cards. Ide- 
ally, this might mean that we apply an arbitrary permutation 7 € G,, to 
our starting order (1,2,...,m), each of them with the same probability a. 
Thus, after doing this just once, we would have our deck of cards in order 
mw = (n(1),7(2),...,7(n)), and this would be a perfect random order. But 
that’s not what happens in real life. Rather, when shuffling only “certain” 
permutations occur, perhaps not all of them with the same probability, and 
this is repeated a “certain” number of times. After that, we expect or hope 
the deck to be at least “close to random.” 


Top-in-at-random shuffles 


These are performed as follows: you take the top card from the deck, and 
insert it into the deck at one of the n distinct possible places, each of them 
with probability 4. Thus one of the permutations 


i 
i 
7 = (2,8,0.+54,1,441,...,7) 


is applied, 1 < z < n. After one such shuffle the deck doesn’t look random, 
and indeed we expect to need lots of such shuffles until we reach that goal. 


A typical run of top-in-at-random shuffles may look as follows (for n = 5): 


A little calculus shows that (1 = +)" is 
an increasing function in n, which con- 
verges to 1/e. So (1—+)" < +holds 
for alln > 1. 


<i ' 
Se ap > 
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¥ a 
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A 
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“Top-in-at-random” 


—> 


OUR |] Oo Pb JF 
OU] Re TB J] do J] bo 
OU] re TB J] bo [] co 
CO ]POuUl rR |B [] bo 


CO ]POul re |e ]] bo 


DO |} Oo [J OuyprRe |e 


How should we measure “being close to random’? Probabilists have cooked 
up the “variation distance” as a rather unforgiving measure of randomness: 
We look at the probability distribution on the n! different orderings of our 
deck, or equivalently, on the n! different permutations 0 € G,, that yield 
the orderings. 
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For card players, the question is not 


“exactly how close to uniform is the 


deck after a million riffle shuffles?”, but 


“is 7 shuffles enough?” 


yuk) 


14 


(Aldous & Diaconis [1]) 


a 


ko 


Two examples are our starting distribution E, which is given by 
E(jid) = 1, 
E(r) = 0 _ otherwise, 
and the uniform distribution U given by 
U(r) = 4 forall rE Gp. 


The variation distance between two probability distributions Q, and Q2 is 
now defined as 


Qi —Qoll = 3 So [Qi(r) - Q(n)]. 
TEGn 
By setting S :-= {7 € G, : Qi(m) > Qo(z)} and using )7 Qu(z) = 


>2,, Q2(7) = 1 we can rewrite this as 


Qi, — Qol|_ = max [Qi(S) ~ Q(5)), 


with Q;(S) = Yo eg Qi(m). Clearly we have 0 < ||Q; — Qo|| < 1. 
In the following, “being close to random” will be interpreted as “having 
small variation distance from the uniform distribution.” Here the distance 
between the starting distribution and the uniform distribution is very close 
to 1: 


E—Ul| = 1-4. 
After one top-in-at-random shuffle, this will not be much better: 
Top — Ul] = 1- pA. 


The probability distribution on G,, that we obtain by applying the top-in-at- 
random shuffle k times will be denoted by Top*”. So how does || Top** —U]| 
behave if & gets larger, that is, if we repeat the shuffling? And similarly for 
other types of shuffling? General theory (in particular, Markov chains on 
finite groups; see e.g. Behrends [3]) implies that for large / the variation 
distance d(k) := ||Top** — UJ] goes to zero exponentially fast, but it does 
not yield the “cut-off” phenomenon that one observes in practice: After a 
certain number ko of shuffles “suddenly” d(k) goes to zero rather fast. Our 
margin displays a schematic sketch of the situation. 


Strong uniform stopping rules 


The amazing idea of strong uniform stopping rules by Aldous and Diaconis 
captures the essential features. Imagine that the casino manager closely 
watches the shuffling process, analyzes the specific permutations that are 
applied to the deck in each step, and after a number of steps that depends on 
the permutations that he has seen calls “STOP!”. So he has a stopping rule 
that ends the shuffling process. It depends only on the (random) shuffles 
that have already been applied. The stopping rule is strong uniform if the 
following condition holds for all k > 0: 


If the process is stopped after exactly k steps, then the resulting 
permutations of the deck have uniform distribution (exactly!). 
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Let T’ be the number of steps that are performed until the stopping rule 
tells the manager to cry “STOP!”; so this is a random variable. Similarly, 
the ordering of the deck after k shuffles is given by a random variable X; 
(with values in G,,). With this, the stopping rule is strong uniform if for all 
feasible values of k, 


Prob| X, =a | T= 4] = for all € G,. 


n! 
Three aspects make this interesting, useful, and remarkable: 


1. Strong uniform stopping rules exist: For many examples they are quite 
simple. 


2. Moreover, these can be analyzed: Trying to determine Prob/T’ > &] 
leads often to simple combinatorial problems. 


3. This yields effective upper bounds on the variation distances such as 
d(k) = ||Top“* — Ul. 


For example, for the top-in-at-random shuffles a strong uniform stopping 
tule is 


“STOP after the original bottom card (labelled 7) is first inserted 
back into the deck.” 


Indeed, if we trace the card n during these shuffles, 


Conditional probabilities 
The conditional probability 


Prob[A| B] 


denotes the probability of the event 
A under the condition that B hap- 
pens. This is just the probability that 
both events happen, divided by the 
probability that B is true, that is, 


Prob[A A B] 


Prob[A| B] = Prob[B] 


DO ]] Co |POuUyp Re |e 


1 2 J £3 2 |—»[ 2 

2 3 2 4 4 

3 4 4 1 1 

4 1 1 5 5 

5 5 5 3 3 
Ti 


we see that during the whole process the ordering of the cards below this 
card is completely uniform. So, after the card n rises to the top and then is 
inserted at random, the deck is uniformly distributed; we just don’t know 
when precisely this happens (but the manager does). 

Now let 7;; be the random variable which counts the number of shuffles that 
are performed until for the first time 2 cards lie below card n. So we have 
to determine the distribution of 


T = %]14+(%—-—%)+-++4+ Gn-1 -— Tn-2) + (T — Th-1). 


But each summand in this corresponds to a coupon collector’s problem: 
T; — T;—1 is the time until the top card is inserted at one of the 2 possible 
places below the card n. So it is also the time that the coupon collector 
takes from the (n — i)-th coupon to the (n — 7 + 1)-st coupon. Let V; be 
the number of pictures bought until he has 7 different pictures. Then 


Le ee 


Vn = Vi + (Vo (Vn—-1 Vn—2) + (Vn — Vn-1), 
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and we have seen that Prob[T; — T;-1 = j] = Prob[V,—i41 — Vn-i = J] 
for all 7 and 7. Hence the coupon collector and the top-in-at-random shuffler 
perform equivalent sequences of independent random processes, just in the 
opposite order (for the coupon collector, it’s hard at the end). Thus we know 
that the strong uniform stopping rule for the top-in-at-random shuffles takes 
more than k = [nlogn + cn] steps with low probability: 


Prost Sk) <a“. 


And this in turn means that after k = [nlogn + cn] top-in-at-random 
shuffles, our deck is “‘close to random,” with 
d(k) = |\Top"*— ul] < e°%, 


due to the following simple but crucial lemma. 


Lemma. Let Q : G,, — R be any probability distribution that defines a 
shuffling process Q** with a strong uniform stopping rule whose stopping 
time is T. Then for all k > 0, 


|Q** — Ul] < Prob[T > k]. 


@ Proof. If X is a random variable with values in G,,, with probability 
distribution Q, then we write Q(S') for the probability that X takes a value 
in S C G,,. Thus Q(S) = Prob[X € S], and in the case of the uniform 
distribution Q = U we get 


U(S) = Prob[X € S] = er 


For every subset S C G,,, we get the probability that after k steps our deck 
is ordered according to a permutation in S' as 


Q**(S) = ProblLX;, € S] 


= J ProblX,€5 A T= j] + ProblX,eS A T>hkl 
JSk 


= )_U(S)Prob[T = j] + Prob[X, € S|T > k] - Prob[T > k] 


= U(S)(1— Prob[T > k]) + Prob[X;, € S|T > k]- Prob[T > &] 
= U(S) + (ProbLX;, € S|T > k] — U(S)) - Prob[T' > k]. 
This yields 
|Q**(S) —U(S)| < Prob[T > k] 


since 


Prob[|X;, € S|T > k] — U(S) 


is a difference of two probabilities, so it has absolute value at most 1. 


This is the point where we have completed our analysis of the top-in-at- 
random shuffle: We have proved the following upper bound for the number 
of shuffles needed to get “close to random.” 
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Theorem 1. Let c > 0 and k = [nlogn + cn]. Then after performing k 
top-in-at-random shuffles on a deck of n cards, the variation distance from 
the uniform distribution satisfies 


d(k) = ||Top**—Ul| < e°. 


One can also verify that the variation distance d(k) stays large if we do 
significantly fewer than n log n top-in-at-random shuffles. The reason is 
that a smaller number of shuffles will not suffice to destroy the relative 
ordering on the lowest few cards in the deck. 

Of course, top-in-at-random shuffles are extremely inefficient — with the 
bounds of our theorem, we need more than n log n * 205 top-in-at random 
shuffles until a deck of n = 52 cards is mixed up well. Thus we now switch 
our attention to a much more interesting and realistic model of shuffling. 


Riffle shuffles 


This is what dealers do at the casino: They take the deck, split it into two 
parts, and these are then interleaved, for example by dropping cards from 
the bottoms of the two half-decks in some irregular pattern. 

Again a riffle shuffle performs a certain permutation on the cards in the 
deck, which we initially assume to be labelled from 1 to n, where 1 is the 
top card. The riffle shuffles correspond exactly to the permutations 7 € G,, 
such that the sequence 


(w(1), 7 (2), ... ,(n)) 


consists of two interlaced increasing sequences (only for the identity per- 
mutation it is one increasing sequence), and that there are exactly 2” — n 
distinct riffle shuffles on a deck of n cards. 


of 1 
of 1 of 1 
of 2 
of 2 | «= if. 3 
if 3 | os p4 iff 4 
1] 4 = of 2 
if 5 my i 

1 


In fact, if the pack is split such that the top t cards are taken into the right 
hand (0 < t < n) and the other n —t cards into the left hand, then there are 
Cy ways to interleave the two hands, all of which generate distinct permu- 
tations — except that for each ¢ there is one possibility to obtain the identity 
permutation. 

Now it’s not clear which probability distribution one should put on the riffle 
shuffles — there is no unique answer since amateurs and professional deal- 
ers would shuffle differently. However, the following model, developed 
first by Edgar N. Gilbert and Claude Shannon in 1955 (at the legendary 
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The inverse riffle shuffles correspond to 
the permutations 7 = (7(1),...,7(n)) 
that are increasing except for at most 
one “descent.” (Only the identity per- 
mutation has no descent.) 


Bell Labs “Mathematics of Communication” department at the time), has 
several virtues: 

e itis elegant, simple, and seems natural, 

e it models quite well the way an amateur would perform riffle shuffles, 
e and we have a chance to analyze it. 

Here are three descriptions — all of them describe the same probability 
distribution Rif on G,,: 


1. Rif : G,, —> Ris defined by 


n 


n+l 2 24 
5 if 7 = id, 


Rif(7) = se if 7 consists of two increasing sequences, 
0 otherwise. 


2. Cut off ¢ cards from the deck with probability = () take them into 
your right hand, and take the rest of the deck into your left hand. Now 
when you have r cards in the right hand and ¢ in the left, “drop” the 
bottom card from your right hand with probability =, and from your 


left hand with probability Repeat! 


AT: 

3. An inverse shuffle would take a subset of the cards in the deck, remove 
them from the deck, and place them on top of the remaining cards of 
the deck — while maintaining the relative order in both parts of the 
deck. Such a move is determined by the subset of the cards: Take all 
subsets with equal probability. 


Equivalently, assign a label “0” or “1” to each card, randomly and in- 
dependently with probabilities s, and move the cards labelled “0” to 
the top. 


It is easy so see that these descriptions yield the same probability distri- 
butions. For (1) <=> (3) just observe that we get the identity permutation 
whenever all the 0-cards are on top of all the cards that are assigned a 1. 


This defines the model. So how can we analyze it? How many riffle shuffles 
are needed to get close to random? We won’t get the precise best-possible 
answer, but quite a good one, by combining three components: 


(1) We analyze inverse riffle shuffles instead, 
(2) we describe a strong uniform stopping rule for these, 


(3) and show that the key to its analysis is given by the birthday paradox! 


Theorem 2. After performing k: riffle shuffles on a deck of n cards, the 
variation distance from a uniform distribution satisfies 


|Rif** — UJ] < 1-T](-4). 


i=1 
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M@ Proof. (1) We may indeed analyze inverse riffle shuffles and try to see 
how fast they get us from the starting distribution to (close to) uniform. 
These inverse shuffles correspond to the probability distribution that is given 
by Rif (7) := Rif(a~*). 
Now the fact that every permutation has its unique inverse, and the fact that 
U(r) = U(r7}), yield 


Rif™* — Ul] = || RIF — UI. 


(This is Reeds’ inversion lemma!) 
(2) In every inverse riffle shuffle, each card gets associated a digit 0 or 1: 


ao 
1 2 


1 3 
1 5 


rFPrRFrF Oo oO 
OU}] Co |} DO J] & I] re 


OU] HE |} Co J) bo TTR 


If we remember these digits — say we just write them onto the cards — 
then after / inverse riffle shuffles, each card has gotten an ordered string of 
k; digits. Our stopping rule is: 


“STOP as soon as all cards have distinct strings.” 


When this happens, the cards in the deck are sorted according to the binary 
numbers 6;,b,_1...b2b1, where b; is the bit that the card has picked up 
in the 7-th inverse riffle shuffle. Since these bits are perfectly random and 
independent, this stopping rule is strong uniform! 

In the following example, for n = 5 cards, we need T’ = 3 inverse shuffles 
until we stop: 


000 4 00 4 0 1 1 
001 2 Ol 2 0 4 2 
010 1 =z 01 5 =x 1 2 =< 3 
101 5 10 1 1 3 4 
111 3 11 3 1 5 ) 


(3) The time T’ taken by this stopping rule is distributed according to the 
birthday paradox, for K = 2": We put two cards into the same box if they 
have the same label by.by—1... bob, € {0,1}*. So there are K = 2* boxes, 
and the probability that some box gets more than one card ist 


n—-1 7 
Prob[? > k] = 1-]] (1-s). 
i=l 


and as we have seen this bounds the variation distance ||Rif*” — U|| = 
awk 
|| Rif ~— UJ]. 
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dk) 
1.000 
1.000 
1.000 
1.000 
0.952 
0.614 
0.334 
0.167 
0.085 
0.043 


a 
Sone wee wes 


The variation distance after k riffle shuf- 


fles, according to [2] 


[eet => 


“Random enough?” 


So how often do we have to shuffle? For large n we will need roughly 
k; = 2log,(n) shuffles. Indeed, setting & := 2log,(cn) for some c > 1 we 


find (with a bit of calculus) that P[T > k] ~ 1—e7 ae my —. 
Explicitly, for nm = 52 cards the upper bound of Theorem 2 reads d(10) < 
0.73, d(12) < 0.28, d(14) < 0.08 — so k = 12 should be “random 
enough” for all practical purposes. But we don’t do 12 shuffles “in practice” 
— and they are not really necessary, as a more detailed analysis shows 
(with the results given in the margin). The analysis of riffle shuffles is part 
of a lively ongoing discussion about the right measure of what is “random 
enough.” Diaconis [4] is a guide to recent developments. 


Indeed, does it matter? Yes, it does: Even after three good riffle shuffles a 
sorted deck of 52 cards looks quite random ... but it isn’t. Martin Gardner 
[5, Chapter 7] describes a number of striking card tricks that are based on 
the hidden order in such a deck! 
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Lattice paths and determinants 


The essence of mathematics is proving theorems — and so, that is what 
mathematicians do: They prove theorems. But to tell the truth, what they 
really want to prove, once in their lifetime, is a Lemma, like the one by 
Fatou in analysis, the Lemma of Gauss in number theory, or the Burnside— 
Frobenius Lemma in combinatorics. 

Now what makes a mathematical statement a true Lemma? First, it should 
be applicable to a wide variety of instances, even seemingly unrelated prob- 
lems. Secondly, the statement should, once you have seen it, be completely 
obvious. The reaction of the reader might well be one of faint envy: Why 
haven’t I noticed this before? And thirdly, on an esthetic level, the Lemma 
— including its proof — should be beautiful! 

In this chapter we look at one such marvelous piece of mathematical rea- 
soning, a counting lemma that first appeared in a paper by Bernt Lindstrém 
in 1972. Largely overlooked at the time, the result became an instant classic 
in 1985, when Ira Gessel and Gerard Viennot rediscovered it and demon- 
strated in a wonderful paper how the lemma could be successfully applied 
to a diversity of difficult combinatorial enumeration problems. 


The starting point is the usual permutation representation of the determinant 
of a matrix. Let M = (m,,;) be areal n x n matrix. Then 


det M = >» sign o My o(1) ™M20(2) °° * ™Mnea(n)> (1) 
where o runs through all permutations of {1,2,...,n}, and the sign of o 
is 1 or —1, depending on whether o is the product of an even or an odd 
number of transpositions. 


Now we pass to graphs, more precisely to weighted directed bipartite graphs. 


Let the vertices A;,..., A, stand for the rows of M, and B,,..., By, for 
the columns. For each pair of i and j draw an arrow from A; to B; and give 
it the weight m,,;, as in the figure. 


In terms of this graph, the formula (1) has the following interpretation: 


e The left-hand side is the determinant of the path matrix M, whose 
(i, j)-entry is the weight of the (unique) directed path from A; to B;. 


e The right-hand side is the weighted (signed) sum over all vertex-disjoint 
path systems from A = {Aj,...,An}toB = {Bi,...,B,}. Sucha 
system P, is given by paths 


Ay = Bo); ea An — Bon); 
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An acyclic directed graph 


and the weight of the path system P, is the product of the weights of 
the individual paths: 


w(Pz) = w(Ai —> Bo1)) at w(An => Bain): 
In this interpretation formula (1) reads 


det M = > signa w(Pz). 


And what is the result of Gessel and Viennot? It is the natural generalization 
of (1) from bipartite to arbitrary graphs. It is precisely this step which 
makes the Lemma so widely applicable — and what’s more, the proof is 
stupendously simple and elegant. 

Let us first collect the necessary concepts. We are given a finite acyclic 
directed graph G = (V, E£), where acyclic means that there are no directed 
cycles in G. In particular, there are only finitely many directed paths 
between any two vertices A and B, where we include all trivial paths 
A — A of length 0. Every edge e carries a weight w(e). If P is a 
directed path from A to B, written shortly P : A — B, then we define 
the weight of P as 

w(P) = II w(e), 


ecP 
which is defined to be w(P) = 1 if P is a path of length 0. 


Now let A = {Aj,...,An} and B = {B,,...,B,} be two sets of n 
vertices, where A and 6 need not be disjoint. To A and 6 we associate the 
path matrix M = (mj;) with 


P:A,>B; 
A path system P from A to B consists of a permutation o together with n 


paths P; : A; > Bo), fori = 1,...,n; we write sign P = signa . The 
weight of P is the product of the path weights 


w(P) = TT), (2) 


which is the product of the weights of all the edges of the path system. 
Finally, we say that the path system P = (P;,..., P,,) is vertex-disjoint if 
the paths of P are pairwise vertex-disjoint. 

Lemma. Let G = (V,E) be a finite weighted acyclic directed graph, 
A= {Aj,..., An} and B = {B,,..., By} two n-sets of vertices, and M 
the path matrix from A to B. Then 


detM = S°  signP w(P). (3) 


P vertex-disjoint 
path system 
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M@ Proof. A typical summand of det(M) is signa m4o(1) ++ Mno(n)s 
which can be written as 


sgno( SY) w(Pi))-- ( So w(P,)). 


Pi:A1>Be(1) Pri An Bon) 


Summing over o we immediately find from (2) that 


det M = S"signP w(P), 
P 


where P runs through all path systems from A to B (vertex-disjoint or not). 
Hence to arrive at (3), all we have to show is 


baa signP w(P) = 0, (4) 
PEN 


where JV is the set of all path systems that are not vertex-disjoint. And this 
is accomplished by an argument of singular beauty. Namely, we exhibit an 
involution z : N — N (without fixed points) such that for P and 7P 


w(tP) = w(P) and sign7P = —signP. 


Clearly, this will imply (4) and thus the formula (3) of the Lemma. 


The involution 7 is defined in the most natural way. Let P € N with paths 
P; : A; + Boi). By definition, some pair of paths will intersect: 


e Let ig be the minimal index such that P;, shares some vertex with 
another path. 


e Let X be the first such common vertex on the path P;,. 


e Let jo be the minimal index (j9 > io) such that P;, has the vertex X 


in common with P,,. 


Now we construct the new system 7P = (P[,..., P/) as follows: 


e Set Pi = P,, for all k F io, jo. 


e The new path Pi, goes from A;, to X along P;,, and then continues 
to By (jo) along P;,. Similarly, Fn goes from A;, to X along P;, and 
continues to B,(j;,) along P;,. 


Clearly 7(P) = P, since the index io, the vertex X, and the index jp are 
the same as before. In other words, applying 7 twice we switch back to 
the old paths P;. Next, since 7P and P use precisely the same edges, we 
certainly have w(7P) = w(P). And finally, since the new permutation o’ 
is obtained by multiplying o with the transposition (io, jo), we find that 
sign7P = —signP , and that’s it. 


The Gessel—Viennot Lemma can be used to derive all basic properties of 
determinants, just by looking at appropriate graphs. Let us consider one 
particularly striking example, the formula of Binet—Cauchy, which gives a 
very useful generalization of the product rule for determinants. 
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1 
1 1 
1 2 1 
1 3 3 41 
1 4 6 4 1 
1 5 10 10 5 1 
1 6 15 20 15 6 
1 7 21 35 35 21 
a=3 

b=4 


1 
7 


1 


Theorem. /f P is anr x s matrix and Q an s x r matrix, r < s, then 


det(PQ) = 5 (det Pz) (det Qz), 


Zz 


where Pz is the r x r submatrix of P with column-set Z, and Qz ther x r 
submatrix of Q with the corresponding rows Z. 


@ Proof. Let the bipartite graph on A and BG correspond to P as before, and 
similarly the bipartite graph on B and C to @. Consider now the concate- 
nated graph as indicated in the figure on the left, and observe that the (7, 7)- 
entry m,, of the path matrix M from A to C is precisely mi; = > 5 Pik Ukj> 
thus M = PQ. 

Since the vertex-disjoint path systems from A to C in the concatenated 
graph correspond to pairs of systems from A to Z resp. from Z to C, the 
result follows immediately from the Lemma, by noting that sign (a7) = 
(sign o) (sign T). 


The Lemma of Gessel—Viennot is also the source of a great number of re- 
sults that relate determinants to enumerative properties. The recipe is al- 
ways the same: Interpret the matrix / as a path matrix, and try to compute 
the right-hand side of (3). As an illustration we will consider the original 
problem studied by Gessel and Viennot, which led them to their Lemma: 


Suppose that a, < ag <-++: < dp and by < bg < +--+: < by are two 
sets of natural numbers. We wish to compute the determinant of the 
matrix M = (m,;), where m,, is the binomial coefficient (v%)- 


In other words, Gessel and Viennot were looking at the determinants of 
arbitrary square matrices of Pascal’s triangle, such as the matrix 


G) G) @) 5 
det} (7) (3) G | = det : 


(G) G) G) 


given by the bold entries of Pascal’s triangle, as displayed in the margin. 
As a preliminary step to the solution of the problem we recall a well-known 
result which connects binomial coefficients to lattice paths. Consider an 
a x b-lattice as in the margin. Then the number of paths from the lower 
left-hand corner to the upper right-hand corner, where the only steps that 
are allowed for the paths are up (North) and to the right (East), is ease 
The proof of this is easy: each path consists of an arbitrary sequence of b 
“east” and a “north” steps, and thus it can be encoded by a sequence of the 
form NENEEEN, consisting of a+b letters, a N’s and b E’s. The number of 
such strings is the number of ways to choose a positions of letters N from 
a total of a + 6 positions, which is Cc) = ear 


a 
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Now look at the figure to the right, where A; is placed at the point (0, —a;) 
and B; at (b;, —b;). 

The number of paths from A; to B; in this grid that use only steps to the 
north and east is, by what we just proved, (are) = Cos In other 
words, the matrix of binomials // is precisely the path matrix from Ato B 
in the directed lattice graph for which all edges have weight 1, and all edges 
are directed to go north or east. Hence to compute det IZ we may apply 
the Gessel—Viennot Lemma. A moment’s thought shows that every vertex- 
disjoint path system P from A to 6 must consist of paths P; : A; > B; for 
alli. Thus the only possible permutation is the identity, which has sign = 1, 
and we obtain the beautiful result 


det (@) = # vertex-disjoint path systems from A to B. 


In particular, this implies the far from obvious fact that det / is always 
nonnegative, since the right-hand side of the equality counts something. 
More precisely, one gets from the Gessel—Viennot Lemma that det M = 0 
if and only if a; < b; for some 7. 


In our previous small example, 


(*) = ye aot 
~ path systemsin 4 4 


kd 


6 


“Lattice paths” 
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Cayley’s formula 
for the number of trees 


One of the most beautiful formulas in enumerative combinatorics concerns 
the number of labeled trees. Consider the set N = {1,2,...,n}. How 
many different trees can we form on this vertex set? Let us denote this 
number by T;,. Enumeration “by hand” yields 7 1, T5 1, T3 3, 
T, = 16, with the trees shown in the following table: 


2) i Be i 2 tk ® 

3 33 3 
il Ba 2 at Be tt F ie ht eS LT hUhmehUCUP 
IS ZL NIZA VS 
a wh sy wl a a 8 al os al Be Ul BUM OURO 
ie | eames) | > |) S| =) |) | ss) | es) 


Note that we consider /abeled trees, that is, although there is only one tree 
of order 3 in the sense of graph isomorphism, there are 3 different labeled 
trees obtained by marking the inner vertex 1, 2 or 3. For n = 5 there are 
three nonisomorphic trees: 


a ay is 


For the first tree there are clearly 5 different labelings, and for the second 
and third there are e = 60 labelings, so we obtain T; = 125. This should 
be enough to conjecture T;, = n”~?, and that is precisely Cayley’s result. 


Theorem. There are n”~? different labeled trees on n vertices. 


This beautiful formula yields to equally beautiful proofs, drawing on a 
variety of combinatorial and algebraic techniques. We will outline three 
of them before presenting the proof which is to date the most beautiful of 
them all. 
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2 


2 


1] 


@ 


The four trees of 72 


7 
1 10 
, 35 3 
GF 
6 
2 


@ First proof (Bijection). The classical and most direct method is to find 
a bijection from the set of all trees on n vertices onto another set whose 
cardinality is known to be n”~?. Naturally, the set of all ordered sequences 
(d1,---,@n—2) with 1 < a; < n comes into mind. Thus we want to 
uniquely encode every tree T by a sequence (a@1,...,@n—2). Such a code 
was found by Priifer and is contained in most books on graph theory. 

Here we want to discuss another bijection proof, due to Joyal, which is 
less known but of equal elegance and simplicity. For this, we consider not 
just trees ton N = {1,...,n} but trees together with two distinguished 
vertices, the left end © and the right end |_|, which may coincide. Let 
Tr = {(t; ©, [))} be this new set; then, clearly, |7;,| = ?Ty. 


Our goal is thus to prove |7;,| = n”. Now there is a set whose size is 
known to be n”, namely the set NV of all mappings from N into NV. Thus 
our formula is proved if we can find a bijection from NY onto Tp. 

Let f : N —>+ N be any map. We represent f as a directed graph G rf by 
drawing arrows from i to f(¢). 

For example, the map 


_f1 2 3 4 5 6 7 8 9 10 
Clr fre eres 
is represented by the directed graph in the margin. 
Look at a component of G f- Since there is precisely one edge emanating 
from each vertex, the component contains equally many vertices and edges, 
and hence precisely one directed cycle. Let M C WN be the union of the 


vertex sets of these cycles. A moment’s thought shows that I is the unique 
maximal subset of NV such that the restriction of f onto M acts as a bijection 


: a De ces Zz 
on M. Write = such that the numbers 
f= (sa) 90). #) 
a,b,..., 2 in the first row appear in natural order. This gives us an ordering 


f(a), f(b),..., f(z) of M according to the second row. Now f(a) is our 
left end and f(z) is our right end. 


The tree ¢ corresponding to the map f is now constructed as follows: Draw 


f(a),..., f(z) in this order as a path from f(a) to f(z), and fill in the 
remaining vertices as in a ¢ (deleting the arrows). 


In our example above we obtain M = {1,4,5,7,8, 9} 


flu = 1457 8 9 
MY\ 7 9 15 8 4 


and thus the tree ¢ depicted in the margin. 
It is immediate how to reverse this correspondence: Given a tree t, we look 
at the unique path P from the left end to the right end. This gives us the 
set M and the mapping f|,7. The remaining correspondences i — f(z) are 
then filled in according to the unique paths from 2 to P. 
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Second proof (Linear Algebra). We can think of T;, as the number of 
spanning trees in the complete graph K’,,. Now let us look at an arbitrary 
connected simple graph G on V = {1,2,...,n}, denoting by t(G) the 
number of spanning trees; thus T,, = t(K,,). The following celebrated 
result is Kirchhoff’s matrix-tree theorem (see [1]). Consider the incidence 
matrix B = (b;-) of G, whose rows are labeled by V, the columns by F, 
where we write bj. = 1 or 0 depending on whether i € e ori ¢ e. Note 
that |E| > n — 1 since G is connected. In every column we replace one 
of the two 1’s by —1 in an arbitrary manner (this amounts to an orientation 
of G), and call the new matrix C. M = CC7 is then a symmetric n x n 
matrix with the degrees d;,...,d,, in the main diagonal. 


Proposition. We have t(G) = det M;; for alli = 1,...,n, where Mj; 
results from M by deleting the i-th row and the i-th column. 


@ Proof. The key to the proof is the Binet-Cauchy theorem proved in the 
previous chapter: When P is an r x s matrix and Q an s x r matrix, r < s, 
then det(PQ) equals the sum of the products of determinants of corre- 
sponding r x r submatrices, where “corresponding” means that we take the 
same indices for the r columns of P and the r rows of Q. 


For M,,; this means that 
= ; To . 2 
det Mi; = a det N-detN* = dey idet N)°, 


where NV runs through all (n — 1) x (n—1) submatrices of C'\{row i}. The 
n — 1 columns of N correspond to a subgraph of G with n — 1 edges on n 
vertices, and it remains to show that 


+1 if these edges span a tree 
0 otherwise. 


det = { 


Suppose the n — 1 edges do not span a tree. Then there exists a component 
which does not contain 7. Since the corresponding rows of this component 
add to 0, we infer that they are linearly dependent, and hence det N = 0. 
Assume now that the columns of N span a tree. Then there is a ver- 
tex j; #1 of degree 1; let e; be the incident edge. Deleting j1,e1 we 
obtain a tree with n — 2 edges. Again there is a vertex jo 4 1 of de- 
gree | with incident edge eg. Continue in this way until 71, j2,.--,Jn—1 
and €1,€2,..-,€n—1 With 7; € e; are determined. Now permute the rows 
and columns to bring jz into the k-th row and e, into the k-th column. 
Since by construction 7, ¢ e¢ for k < £, we see that the new matrix N’ is 
lower triangular with all elements on the main diagonal equal to +1. Thus 
det N = +det N’ = +1, and we are done. 


For the special case G = K,, we clearly obtain 


m—-1 =1 04. =! 
-l n-1 —l 


Mi = . . . 
ot aie ve, pei 


and an easy computation shows det Mj; = n”~?. 


“A nonstandard method of counting 
trees: Put a cat into each tree, walk your 
dog, and count how often he barks.” 
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@ Third proof (Recursion). Another classical method in enumerative 
combinatorics is to establish a recurrence relation and to solve it by 
induction. The following idea is essentially due to Riordan and Rényi. 
To find the proper recursion, we consider a more general problem (which 
already appears in Cayley’s paper). Let A be an arbitrary k-set of the 
vertices. By T;,,, we denote the number of (labeled) forests on {1,...,} 
consisting of & trees where the vertices of A appear in different trees. 
Clearly, the set A does not matter, only the size k. Note that T;,.1 = Th. 


_ 2 it 42 dt Bd Bat Z2ite?z it Bi ze 
I ENS I se 
For example, T4,2 = 8 for A = {1,2} Se ee eee ee 


Consider such a forest F with A = {1,2,...,k}, and suppose 1 is adja- 
cent to 2 vertices, as indicated in the margin. Deleting 1, the 7 neighbors 


together with 2,...,k yield one vertex each in the components of a forest 
1 4 3 os k that consists of k — 1 + 7 trees. As we can (re)construct F’ by first fixing 2, 
then choosing the 7 neighbors of 1 and then the forest F’\1, this yields 
n—k ae: 
Ak 2 ( ; ) 1L,k—-1+ (1) 


for alln > k > 1, where we set Joo = 1, T,,9 = 0 for n > 0. Note that 
To,0 = 1 is necessary to ensure T;, , = 1. 
Proposition. We have 


T..=k n—-k-1 , 
and thus, in particular, nk sf (2) 


ay = Saye 


@ Proof. By (1), and using induction, we find 


n—k 1, 
Trk = x (" . *) (kK-1+i)(n-1)™ + ** G3n-k-7) 


lI 
a 
3 
2. | 
> 
Sa 
3 
| 
an 
| 
Ss 
= 
| 
= 
i 
aa 


1=0 
- manne ("A ) = 
dn . 
= n?-* _ (n—k) » ( ; *)(n=1) 
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@ Fourth proof (Double Counting). The following marvelous proof due 
to Arnon Avron and Nachum Dershowitz, which builds on an idea of Jim 
Pitman, gives Cayley’s formula and its generalization (2) without induction 
or bijection — it is just clever counting in two ways. 


We consider labeled forests with vertex set {1,...,n}. A rooted forest is a 
forest together with a choice of a root in each component tree. Let F;,;, be 
the set of all rooted forests that consist of k rooted trees. Thus F,, 1 is the set 
of all rooted trees. Let us set Fy, ¢ = |Fn,x|, and note that PF, , = (ads 
with T;,,, as in the third proof, since we may choose the k roots in Ga 
possible ways. 


Here is the crucial idea: We count in two ways the number of rooted forests 
on n vertices that consist of & trees and have one distinguished non-root 
vertex. This will yield the equality 


The left side is clear: In every forest F € F;,,,, we may choose any one of 
the n — k non-root vertices as the distinguished vertex. 

For the expression on the right side, consider a forest F’ € F,, 4,41. Choose 
one of the n vertices in F”, say v, and attach to it any one of the & trees that 
do not contain v. The root of the chosen tree becomes the distinguished 
vertex. (This is illustrated in the figure.) 


Ty ie is 


vat pick v 
v eisai Vv 


and T; 


As there are n choices for v and k choices for the tree that does not contain 
v, we get knF,, 441 choices altogether. All rooted forests in F;,,, with a 
distinguished non-root vertex arise uniquely in this process. 


Iterating (3) n — 1 times, we obtain 


1 a 
Fra = Sa Ale Fr3 = 
7 1-2---(k—1) kat _ 
(n —1)(n—2)---(n—k+4+1) a 
1-2---(n-1 
= o ) OE ge nn Fan: 


Since there is only one forest in F;,,, (each root being its own tree), we 
have F,,, = 1 and conclude that F,.) = n"~?. 


Illustration of the right side of (3). 
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Cayley’s formula for the number of trees 


We get even more out of this proof, namely that 


ap ie 1 n—-k _ [7% n—k-1 
Eye & a — (j,) an : 


With Fx = (j)Tn,~ we have reproved the formula T,,,. = kn”~*~! 
without recourse to induction. 


Let us end with a historical note. Cayley’s paper from 1889 was anticipated 
by Carl W. Borchardt (1860), and this fact was acknowledged by Cayley 
himself. An equivalent result appeared even earlier in a paper of James J. 
Sylvester (1857), see [3, Chapter 3]. The novelty in Cayley’s paper was 
the use of graph theory terms, and the theorem has been associated with his 
name ever since. 
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Identities versus bijections 


Consider the infinite product (1 + x)(1 + 2?)(1 + 2°)(1 + 2*)--- and 
expand it in the usual way into a series }°,,.) Gnx” by grouping together 
those products that yield the same power x”. By inspection we find for the 
first terms 


[[G@+2*) =14+ 0427 + 20? + 204 + 30° + 40° + 57 +---. (1) 
k>1 


So we have e.g. dg = 4, a7 = 5, and we (rightfully) suspect that a,, goes 
to infinity with n —> oo. 

Looking at the equally simple product (1 — x)(1—a?)(1—2°)(1—2*)--- 
something unexpected happens. Expanding this product we obtain 


[[G-=*) = 1-g—a¢? +95 4a" —e¢ — 9 49724976... . (2) 
k>1 


It seems that all coefficients are equal to 1, —1 or 0. But is this true? And 
if so, what is the pattern? 

Infinite sums and products and their convergence have played a central role 
in analysis since the invention of the calculus, and contributions to the 
subject have been made by some of the greatest names in the field, from 
Leonhard Euler to Srinivasa Ramanujan. 

In explaining identities such as (1) and (2), however, we disregard conver- 
gence questions — we simply manipulate the coefficients. In the language 
of the trade we deal with “formal” power series and products. In this frame- 
work we are going to show how combinatorial arguments lead to elegant 
proofs of seemingly difficult identities. 

Our basic notion is that of a partition of a natural number. We call any sum 


Noe: = ag ee oily 2g Se See Sy St 


a partition of n. P(n) shall be the set of all partitions of n, with p(n) = 
|P(n)|, where we set p(0) = 1. 


What have partitions got to do with our problem? Well, consider the 
following product of infinitely many series: 


(lta+a?+a%+4---)(1ta?+at+a+.--)(L+a%+e%+294---) ++ (3) 


where the k-th factor is (1+ 2* + 2?* + 23* +... ). What is the coefficient 
of a” when we expand this product into a series }> 4 nz” ? A moment’s 
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5= 44 
5=34 
5=34 
5=24 
5=24 
5=14 


t 1 
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14 
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The partitions counted by p(5) = 7 
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al 


14+1+1 


Partitions of 6 into odd parts: po(6) = 4 


7=7 
7=54+141 
f= S341 
7=34+1414 
7=14+1414 
7=7 
7=641 
7=5+42 
7=443 
ee eee 


a ie 
ee 


t 1 
fF1l+1+4+1 


The partitions of 7 into odd resp. distinct 
parts: po(7) = pa(7) = 5. 


thought should convince you that this is just the number of ways to write n 
as a sum 


nm -ltng-24+n3-34+--- 


3 
I 


= LHe FL 424-5424 34-43 45+. 
eS -— eS eS 
Ny ng ng 


So the coefficient is nothing else but the number p(n) of partitions of n. 


Since the geometric series 1+ 2* +a?" +--- equals se. we have proved 


our first identity: 
1 n 
TL er = Deel)". (4) 
k>1 n>0 


What’s more, we see from our analysis that the factor at accounts for 
the contribution of & to a partition of n. Thus, if we leave out te from 
the product on the left side of (4), then k does not appear in any partition 


on the right side. As an example we immediately obtain 


— = S>p(n)a”, (5) 


i>1 n=O 


where p(n) is the number of partitions of n all of whose summands are 
odd, and the analogous statement holds when all summands are even. 
By now it should be clear what the n-th coefficient in the infinite product 
II.>1(1 + 2*) will be. Since we take from any factor in (3) either 1 or x*, 
this means that we consider only those partitions where any summand & 
appears at most once. In other words, our original product (1) is expanded 
into 

[[G+="*) = S> pa(n) x”, (6) 


k>1 n>0 


where pq(7) is the number of partitions of n into distinct summands. 


Now the method of formal series displays its full power. Since 1 — x? = 
(1 — x)(1 + x) we may write 
1-a¢™* 1 
k 
[[a+« yee II a II {= geet 
k>1 k>1 k>1 
since all factors 1 — x?’ with even exponent cancel out. So, the infinite 


products in (5) and (6) are the same, and hence also the series, and we 
obtain the beautiful result 


Po(n) = pa(n) for all n > 0. (7) 


Such a striking equality demands a simple proof by bijection — at least that 
is the point of view of any combinatorialist. 
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Problem. Let P,(n) and Pa(n) be the partitions of n into odd and into 
distinct summands, respectively: Find a bijection from P,(n) onto Pa(n)! 


Several bijections are known, but the following one due to J. W. L. Glaisher 
(1907) is perhaps the neatest. Let \ be a partition of n into odd parts. We 
collect equal summands and have 


Mm = Apter tAr trAgt es bAg Hee HAH HM 
——— nm—C!_-_“—_’ — 
NY ng Te 


= ny: Ay tng: Ag +--+: +e: Ab. 


Now we write ny = 27"! + 27 +.---+ 2’ in its binary representation 
and similarly for the other n;. The new partition \’ of n is then 


NM: HD Dy BPs taney Oy sess, 


We have to check that X’ is in Pa(n), and that ¢ : A +> X is indeed a 
bijection. Both claims are easy to verify: If 2°\; = 2°r; then 2% = 2° 
since \, and A, are odd, and so A; = Aj. Hence 2 is in Py(n). Conversely, 
when n = [41 + [lg +--+ + ps is a partition into distinct summands, then 
we reverse the bijection by collecting all 4; with the same highest power 
of 2, and write down the odd parts with the proper multiplicity. The margin 
displays an example. 


Manipulating formal products has thus led to the equality p(n) = pa(n) 
for partitions which we then verified via a bijection. Now we turn this 
around, give a bijection proof for partitions and deduce an identity. This 
time our goal is to identify the pattern in the expansion (2). 


Look at 
1g a2? te? ta! — 2? 2? ta a 9 a Hou, 


The exponents (apart from 0) seem to come in pairs, and taking the expo- 
nents of the first power in each pair gives the sequence 


1 5 12 22 35 51 70 


well-known to Euler. These are the pentagonal numbers f(j), whose name 
is suggested by the figure in the margin. 


We easily compute f(j) = ay = and f(j) = ay td for the other num- 
ber of each pair. In summary, we conjecture, as Euler has done, that the 
following formula should hold. 


Theorem. 


‘ 52-5 5245 
[[@-="*) = 1+ (-1)"(2° ay ee ) 


k>1 jl 


For example, 

A: 25=54+54+5434+3+14+14+14+1 

is mapped by ¢ to 

NV: 25 = (2+1)5 + (2)3 + (4)1 
=10+5+6+4 
=10+6+5+4. 


We write 
N : 30=124+64+54+44+3 
as 30=4(3+1) + 2(3) + 1(5+3) 
= (1)5 + (44+2+1)3 + (4)1 
and obtain as + (X’) the partition 
A: 30=5434+3434+34+3434 
341414141 
into odd summands. 


Pentagonal numbers 
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As an example consider n = 15, 7 = 2, 
so b(2) = 7. The partition 3+ 24+2+1 
in P(15 — 6(2)) = P(8) is mapped to 
94+2+1+1, which is in P(15—b(1)) = 
P(13). 


Euler proved this remarkable theorem by calculations with formal series, 
but we give a bijection proof from The Book. First of all, we notice by (4) 
that the product [],,., (1—2*) is precisely the inverse of our partition series 


Yin>o P(n)z”. Hence setting [],.,(1 — me) Dino c(m) x”, we find 
o> e(n)x”) - ($= p(n)z”) = 1. 
n>0 n>0 


Comparing coefficients this means that c(n) is the unique sequence with 
c(0) = 1 and 


So e(k)p(n—k) = 0 foralln>1. (9) 
k=0 
= 3s +i 
Writing the right-hand of (8) as 5° (—1) x=, we have to show that 
j=—0oo 
1 fork = 22 ay td , when 7 € Zis even, 
c(k) = —-1 fork= Sapo 


0 otherwise 


21, 
gives this unique sequence. Setting b(7) = sith for 7 € Zand substituting 
these values into (9), our conjecture takes on the simple form 


S> p(n —(9)) = So p(n—b(/)) forall n, 
j even j odd 


where of course we only consider j with b(j) <n. So the stage is set: We 
have to find a bijection 


p: U P(n—- b(j)) — Le (n — b(9 
j even j odd 
Again several bijections have been suggested, but the following construc- 
tion by David Bressoud and Doron Zeilberger is astonishingly simple. We 
just give the definition of @ (which is, in fact, an involution), and invite the 
reader to verify the easy details. 


For A: Ay +++: + Az € P(n — b(j)) set 


(¢+37-U+Q1-1) +--+ Or-1) ift+37 > 1, 
o(A) = 


(Ag+ 1)4+---+ (A, +1) +14+---4+1 ift+37 < Ay, 
e~_ -_“—_’ 
A1-t-3j-1 
where we leave out possible 0’s. One finds that in the first case o()) is in 
P(n — b(j — 1)), and in the second case in P(n — b(j + 1)). 


This was beautiful, and we can get even more out of it. We already know 


that 
IL (1+ 2*) = S > pa(n) 


k>1 n>0 
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As experienced formal series manipulators we notice that the introduction 
of the new variable y yields 


[[@+y2") = S5 pam(n) ay” 


k>1 n,m>0 


where pa,m(n) counts the partitions of n into precisely m distinct sum- 
mands. With y = —1 this yields 


[[@-2*) = So(Ea(n) - Oa(n)) 2", (10) 


k>1 n>0 


where £q(7) is the number of partitions of n into an even number of distinct 
parts, and Og(n) is the number of partitions into an odd number. And here 
is the punchline. Comparing (10) to Euler’s expansion in (8) we infer the 
beautiful result 


3j7 +5 
2 


1 forn= when j > 0 is even, 


Ea(n)-—Oa(n) = 4-1 forn= sy 4 when j > 1 is odd, 


QO otherwise. 


This is, of course, just the beginning of a longer and still ongoing story. The 
theory of infinite products is replete with unexpected indentities, and with 
their bijective counterparts. The most famous examples are the so-called 
Rogers—Ramanujan identities, named after Leonard Rogers and Srinivasa 
Ramanujan, in which the number 5 plays a mysterious role: 


n? 


1 r 
I] (1 — 25’—-4)(1 — 5-1) ~ »; (l—2)(1—a?)---(l1—a”)’ 


k>1 n>0 


n2tn 


I + a 

Pr (1 — 254-3) (1 — 7 5k-2) cr (l—2)(1 —#?)---(1-— a2)" 
The reader is invited to translate them into the following partition identities 
first noted by Percy MacMahon: 


e Let f(m) be the number of partitions of n all of whose summands are 
of the form 5k + 1 or 5k + 4, and g(n) the number of partitions whose 
summands differ by at least 2. Then f(n) = g(n). 


e Let r(n) be the number of partitions of n all of whose summands are 
of the form 5k + 2 or 5k + 3, and s(n) the number of partitions whose 
parts differ by at least 2 and which do not contain 1. Then r(n) = s(n). 


All known formal series proofs of the Rogers—Ramanujan identities are 
quite involved, and for a long time bijection proofs of f(n) = g(n) and 
of r(n) = s(n) seemed elusive. Such proofs were eventually given 1981 
by Adriano Garsia and Stephen Milne. Their bijections are, however, very 
complicated — Book proofs are not yet in sight. 


An example for n = 10: 


10=94+1 
10=8+4+2 
10=74+3 
10=6+4 
10=44+3+2+41 
and 

10 = 10 
10=74+241 
10=64+3+1 
10=5+4+441 
10=54+342, 


so Eq(10) => Oa(10) a 


Ss 
< 
z 
< 
” 
/ 


x 


SH 


ax} 
= 


Ozé6i-Z8sh 


NDIA POSTAGE 


Srinivasa Ramanujan 
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The finite Kakeya problem Chapter 35 


“How small can a set in the plane be in which you can turn a needle 
of length 1 completely around?” 


This beautiful question was posed by the Japanese mathematician Sdichi 
Kakeya in 1917. It gained immediate prominence and, together with its 
higher-dimensional analogs, helped initiate a whole new field, today called 
geometric measure theory. To be precise, by “turning around” Kakeya had 
a continuous motion in mind that returns the needle to the original position 
with its ends reversed, like a Samurai whirling his pole. Any such motion 
takes place in a compact subset of the plane. 


Obviously, a disk of diameter 1 is such a Kakeya needle set (of area | ~ 
0.785), as is the equilateral triangle of height 1 that has area FA = 0.577. 
For convex regions Julius Pal showed that this is the minimum, but in gen- 
eral we can do better: The three-pointed deltoid in the margin is also a 
Kakeya needle set, as seen by moving the inner point around the small 


circle. The area of the deltoid is g 0.398, and Kakeya seems to have 


thought that this is the minimum for connected sets. 


So it was a big surprise when a few years after the question was posed 
Abram Samoilovitch Besicovitch produced needle sets of arbitrarily small 
area. His examples were rather complicated with many holes and large 
diameter, but in a remarkable paper Frederick Cunningham Jr. showed that 
one can even find simply connected needle sets of arbitrarily small area 
inside the circle of diameter 2. 


As a matter of fact, Besicovitch was initially interested in a closely related 
problem, which he then applied to solve the needle problem. Call a compact 
set kX C R” a Kakeya set (or, more aptly, a Besicovitch set) if it contains 
a unit line segment in every direction. Besicovitch proved the spectacular 
result that for every dimension there are Kakeya sets of measure 0. But how 
can this be? Our intuition tells us that these sets need to be somehow spread 
out, since they contain segments in every direction! (In contrast, one can 
show that all Kakeya needle sets, which not only contain a needle in every 
direction, but in which the needle can turn, have positive measure.) 


Now these were the years when the notion of (topological) dimension came 
into being at the hands of Lebesgue, Menger, Hausdorff and others, which 
precisely captured this “spreading out” by various covering conditions; here 
we use the Hausdorff dimension hd(A’). We don’t need the details of the 
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Just take pe (x) = [[,¢p (x — a). 
In particular, a nonzero polynomial can 
vanish on all of F’. 


definition: Let us just note that the Euclidean space R” has Hausdorff 
dimension n, and that hd is a monotone function, so every K C R” 
satisfies hd(K) < n. 


The Kakeya conjecture. Every Kakeya set in R" has Hausdorff 
dimension n. 


The conjecture is true form = 1 and 2, but it is open for all n > 3, 
and it appears to get more difficult as the dimension increases. Today it 
is considered to be one of the major open problems in geometric measure 
theory. 


In an inspiring paper from 1999 Thomas Wolff gave the problem a com- 
pletely new twist by suggesting to look at finite fields F’. Consider the 
vector space F’”. Let us call kK C F” a (finite) Kakeya set if K contains a 
line in every direction, meaning that to every nonzero vector v € F” there 
exists some w € F” such that the line L = {w+tu: t © F} isin K. 
Wolff posed the following finite version of the Euclidean Kakeya problem: 


The finite Kakeya problem. Is there a constant c = c(n), only 
depending on n but not on |F'|, such that every Kakeya set K C F” 
Satisfies 


|K| > clF|"? 


Clearly, this is true for n = 1, the only Kakeya set being all of F’, and 
it is not hard to prove for n = 2, but for higher dimensions progress was 
again slow, until Zeev Dvir provided in his 2008 dissertation a beautiful and 
stunningly simple proof: All we need are two results about polynomials in 
n variables! 

Let us fix some notation. Fx ,,...,2,] denotes the ring of polynomials 
p(@1,-.-,%n) over the finite field F. A monomial x7! --- x5" is some- 
times written shortly as 7*, where )>;"_, 5; is the degree of x*. The degree 
deg p of p(a) = S> a.x* is the maximum degree of the monomials «* with 
nonzero coefficient a,. The zero polynomial has all a, = 0 and is said to 
have degree —1. The polynomial p(x) vanishes on E C F” if p(a) = 0 
holds for alla € E. 

The two ingredients of the proof generalize the following well-known facts 
about polynomials in one variable: 


(1) Every polynomial of degree d > 0 in one variable has at most d roots. 


(2) For every set E C F of size |E| < d there is a nonzero polynomial 
p(a) of degree at most d that vanishes on F’. 


In the following g = |F'| shall denote the size of F’. 
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Lemma 1. Every nonzero polynomial p(x) € F[21,...,%n] of degree d 
has at most dq”! roots in F”. 


H Proof. We use induction on n, with fact (1) above as the starting case 
n = 1. Let us split p(x) into summands according to the powers of 2p, 


p(x) = got 9% + gox?, Sareea ger’, 


where g; € F[ai,...,%n-1] forO < i < @ < d, and g is nonzero. 
We write every v € F” in the form v = (a,b) witha € F"~!,b € F, and 
estimate the number of roots p(a, b) = 0. 

Case 1. Roots (a,b) with ge(a) = 0. 

Since ge # 0 and deg ge < d— £, by induction the polynomial gy has at 
most (d—£)q"~? roots in F”~+, and for each a there are at most q different 
choices for b, which gives at most (d — £)qg"~+ such roots for p(x) in F”. 
Case 2. Roots (a, b) with ge(a) # 0. 

Here p(a,x7,) € F'[a,] is not the zero polynomial in the single variable x,,, 
it has degree @, and hence for each a by (1) there are at most ¢ elements b 
with p(a,b) = 0. Since the number of a’s is at most g”~+ we get at most 
(q"—} roots for p(x) in this way. 

Summing the two cases gives at most 


(d—- ie" +f"? = dq” 


roots for p(2), as asserted. 


Lemma 2. For every set E C F” of size |E| < aay there is a nonzero 


d 
polynomial p(x) € F[a1,...,Xn] of degree at most d that vanishes on E. 
@ Proof. Consider the vector space Vq of all polynomials in F'[a1,... , &n] 


of degree at most d. A basis for Vq is provided by the monomials x7! - + - x5” 
with }> s; < d: 


2 3 
LV Wiiaven 5 Lp y LIL yrs akg Oy pa des gl 


The following pleasing argument shows that the number of monomials 
xj’ +--+ x8 of degree at most d equals the binomial coefficient Ca"), What 
we want to count is the number of n-tuples (s),..., 5,,) of nonnegative in- 
tegers with s;+---+s,, < d. To do this, we map every n-tuple (s1,..., Sn) 


to the increasing sequence 
sptl<sytsgt+2< ++) < sp t+: +s, +7, 


which determines an n-subset of {1,2,...,d +n}. The map is bijective, 


so the number of monomials is ee) = Pr). 


Next look at the vector space F'” of all functions f : E — F; it has 
dimension ||, which by assumption is less than ges = dimVq. The 
evaluation map p(x) ++ (p(a))aer from Vy to F¥ is a linear map of vector 
spaces. We conclude that it has a nonzero kernel, containing as desired a 


nonzero polynomial that vanishes on EF. 


For n = 2 and d = 3 we get a basis of 
size Ce) =10: {1,21, 22, ej, 2120, 


a ee 2 3 
Ld, L}, L{L2, ©1%3, V5} 
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Now we have all things needed to give Dvir’s elegant solution of the finite 
Kakeya problem. 


Theorem. Let kK C F” be a Kakeya set. Then 


ms (Fita-"\ 5 | 


n n! 


@ Proof. The second inequality is clear from the definition of binomial 
coefficients. For the first, set again g = |F'| and suppose for a contradiction 


that 
|K| < q+n—-1\_ (n+q-1 
n a q-1 / 


By Lemma 2 there exists a nonzero polynomial p(a) € F[a1,...,2n] of 
degree d < q — 1 that vanishes on K. Let us write 


p(x) = po(x) + pi(z) +--+ + pa(2), (1) 


where p;(a) is the sum of the monomials of degree 7; in particular, pa(x) is 
nonzero. Since p(x) vanishes on the nonempty set K’, we have d > 0. Take 
any v € F” \ {0}. By the Kakeya property for this v there exists aw € F” 
such that 

piwt+tv) =0 forallte F. 


Here comes the trick: Consider p(w + tv) as a polynomial in the single 
variable t. It has degree at most d < q—1 but vanishes on all qg points of F, 
whence p(w + tv) is the zero polynomial in t. Looking at (1) above we 
see that the coefficient of ¢¢ in p(w + tv) is precisely pa(v), which must 
therefore be 0. But v € F'” \ {0} was arbitrary and pa(0) = 0 since d > 0, 
and we conclude that pq(x) vanishes on all of F'”. Since 


dq” < (q- tig? < a”, 


Lemma 1, however, tells us that pa(a) must then be the zero polynomial — 
contradiction and end of the proof. 


As often happens in mathematics, once a breakthrough is achieved im- 


provements follow quickly. So it was in this case. The lower bound 4 
i 


for the constant c(n) has been improved to 5,, and this is within a factor 


of 2 from the best possible bound. That is, there exist Kakeya sets of size 
roughly =4+>|F |”. 

For recent developments the blog by Terence Tao at terrytao.wordpress.com/ 
tag/kakeya-conjecture/ is an up-to-date source. 
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“Whirling a pole the Kakeya way” 


Completing Latin squares 


Some of the oldest combinatorial objects, whose study apparently goes 
back to ancient times, are the Latin squares. To obtain a Latin square, one 
has to fill the n? cells of an n x n square array with the numbers 1, 2,...,n 
so that that every number appears exactly once in every row and in every 
column. In other words, the rows and columns each represent permutations 
of the set {1,...,}. Let us call n the order of the Latin square. 


Here is the problem we want to discuss. Suppose someone started filling 
the cells with the numbers {1,2,...,n}. At some point he stops and asks 
us to fill in the remaining cells so that we get a Latin square. When is this 
possible? In order to have a chance at all we must, of course, assume that at 
the start of our task any element appears at most once in every row and in 
every column. Let us give this situation a name. We speak of a partial Latin 
square of order n if some cells of an n X 7 array are filled with numbers 
from the set {1,...,} such that every number appears at most once in 
every row and column. So the problem is: 


When can a partial Latin square be completed to a Latin square of 
the same order? 


Let us look at a few examples. Suppose the first n — 1 rows are filled and 
the last row is empty. Then we can easily fill in the last row. Just note that 
every element appears n — 1 times in the partial Latin square and hence is 
missing from exactly one column. Hence by writing each element below 
the column where it is missing we have completed the square correctly. 


Going to the other end, suppose only the first row is filled. Then it is again 
easy to complete the square by cyclically rotating the elements one step in 
each of the following rows. 

So, while in our first example the completion is forced, we have lots of 
possibilities in the second example. In general, the fewer cells are pre- 
filled, the more freedom we should have in completing the square. 
However, the margin displays an example of a partial square with only n 
cells filled which clearly cannot be completed, since there is no way to fill 
the upper right-hand corner without violating the row or column condition. 


If fewer than n cells are filled in ann x n array, can one then always 
complete it to obtain a Latin square? 
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This question was raised by Trevor Evans in 1960, and the assertion that 
a completion is always possible quickly became known as the Evans con- 
jecture. Of course, one would try induction, and this is what finally led 
to success. But Bohdan Smetaniuk’s proof from 1981, which answered 
the question, is a beautiful example of just how subtle an induction proof 
may be needed in order to do such a job. And, what’s more, the proof is 
constructive, it allows us to complete the Latin square explicitly from any 
initial partial configuration. 


Before proceeding to the proof let us take a closer look at Latin squares in 
general. We can alternatively view a Latin square as a 3 x n? array, called 
2/1)3 the line array of the Latin square. The figure to the left shows a Latin square 
of order 3 and its associated line array, where R,C' and E stand for rows, 


ie els columns and elements. 

R:111222333 The condition on the Latin square is equivalent to saying that in any two 

C:123123123 lines of the line array all n? ordered pairs appear (and therefore each pair 

£:132213321 appears exactly once). Clearly, we may permute the symbols in each line 
arbitrarily (corresponding to permutations of rows, columns or elements) 
and still obtain a Latin square. But the condition on the 3 x n? array tells 
us more: There is no special role for the elements. We may also permute 
the lines of the array (as a whole) and still preserve the conditions on the 

If we permute the lines of the above line array and hence obtain a Latin square. 

example cyclically, Latin squares that are connected by any such permutation are called con- 


R— C — E —> R, then we 
obtain the following line array and 
Latin square: 


jugates. Here is the observation which will make the proof transparent: 
A partial Latin square obviously corresponds to a partial line array (every 
pair appears at most once in any two lines), and any conjugate of a partial 
1/213 Latin square is again a partial Latin square. In particular, a partial Latin 
square can be completed if and only if any conjugate can be completed (just 


3} 1 | 2 complete the conjugate and then reverse the permutation of the three lines). 

2/3 ]1 We will need two results, due to Herbert J. Ryser and to Charles C. Lindner, 
R: 132213321 that were known prior to Smetaniuk’s theorem. If a partial Latin square is 
C:111222333 of the form that the first r rows are completely filled and the remaining cells 
E:123123123 are empty, then we speak of an r x n Latin rectangle. 


Lemma 1. Any r x n Latin rectangle, r <n, can be extended to an (r + 
1) x n Latin rectangle and hence can be completed to a Latin square. 


@ Proof. We apply Hall’s theorem (see Chapter 30). Let A; be the set 
of numbers that do not appear in column j. An admissible (r + 1)-st row 
corresponds then precisely to a system of distinct representatives for the 
collection A,,...,A,. To prove the lemma we therefore have to verify 
Hall’s condition (H). Every set A; has size n — r, and every element is in 
precisely n — r sets A; (since it appears 1 times in the rectangle). Any m 
of the sets A; contain together m(n — r) elements and therefore at least m 
different ones, which is just condition (H). 


Lemma 2. Let P be a partial Latin square of order n with at most n — 1 
cells filled and at most * distinct elements, then P can be completed to a 
Latin square of order n. 
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@ Proof. We first transform the problem into a more convenient form. 
By the conjugacy principle discussed above, we may replace the condi- 
tion “at most 5 distinct elements” by the condition that the entries appear 
in at most 5 rows, and we may further assume that these rows are the top 
rows. So let the rows with filled cells be the rows 1,2,...,r, with f; filled 
cells in row 7, where r < 5 and 1 fi <n — 1. By permuting the rows, 
we may assume that f; > fo >--- > f;. Now we complete the rows 
1,...,r step by step until we reach an r x n rectangle which can then be 
extended to a Latin square by Lemma 1. 

Suppose we have already filled rows 1,2,...,¢— 1. In row @ there are f¢ 
filled cells which we may assume to be at the end. The current situation is 
depicted in the figure, where the shaded part indicates the filled cells. 

The completion of row @ is performed by another application of Hall’s 
theorem, but this time it is quite subtle. Let X be the set of elements that 
do not appear in row @, thus |X| = n — fe, and for 7 = 1,...,n — fe 
let A; denote the set of those elements in X which do not appear in 
column j (neither above nor below row @). Hence in order to complete 
row £ we must verify condition (H) for the collection A;,..., An—fy. 

First we claim 


n—fr-€+1 > b-lt forte t fe () 


The case ¢ = 1 is clear. Otherwise pee fi <n, fi >--: => f, and 
1 < €<,r together imply 


> Si > Cpe e eet 
i=l 


Now either fe; > 2 (in which case (1) holds) or fe_; = 1. In the latter 
case, (1) reduces ton > 2(€—1) +r—€+1=r+¢-—1, which is true 
because of <r < 5. 

Let us now take m sets Aj, 1 < m < n— fe, and let B be their union. 
We must show |B| > m. Consider the number c of cells in the m columns 
corresponding to the A,;’s which contain elements of X. There are at most 
(€— 1)m such cells above row ¢ and at most fr+1 +---+ f; below row @, 
and thus 

ce < (€-1m+ feyr +--+ + fr. 


On the other hand, each element « € X\B appears in each of the m 
columns, hence c > m(|X| — |B|), and therefore (with |X| =n — fe) 


|B) > Bl -ge Ss fee 1) ae ae 
It follows that |B| > m if 
n—fe-(€-l-Alfent-:-+fr) > m-l, 
that is, if 


m(n— fe-£+2—m) > ferrt-+-+ fr. (2) 


A situation for n = 8, with 2 = 3, fi = 
fo = fs = 2, fa = 1. The dark squares 
represent the pre-filled cells, the lighter 
ones show the cells that have been filled 
in the completion process. 
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Inequality (2) is true form = 1 and form = n— fe—£+1 by (1), and hence 
for all values m between 1 and n — fe — £+ 1, since the left-hand side is 
a quadratic function in m with leading coefficient —1. The remaining case 
ism >n— fe —€+ 1. Since any element x of X is contained in at most 
€—1+ fezi +--+ fr rows, it can also appear in at most that many 
columns. Invoking (1) once more, we find that x is in one of the sets Aj, so 
in this case B = X,|B| =n — fe > m, and the proof is complete. 


Let us finally prove Smetaniuk’s theorem. 


Theorem. Any partial Latin square of order n with at most n — 1 filled 
cells can be completed to a Latin square of the same order. 


@ Proof. We use induction on n, the cases n < 2 being trivial. Thus we 
now study a partial Latin square of order n > 3 with at most n — 1 filled 
cells. With the notation used above these cells lie in r < n — 1 different 
rows numbered s,,...,8,, which contain f;,..., f; > 0 filled cells, with 
Soy. fi <n — 1. By Lemma 2 we may assume that there are more than 
5 different elements; thus there is an element that appears only once: after 
renumbering and permutation of rows (if necessary) we may assume that 
the element n occurs only once, and this is in row 81. 

In the next step we want to permute the rows and columns of the partial 
Latin square such that after the permutations all the filled cells lie below 
the diagonal — except for the cell filled with n, which will end up on the 
diagonal. (The diagonal consists of the cells (k,&) with 1 < k <n.) We 
achieve this as follows: First we permute row sj into the position f;. By 
permutation of columns we move all the filled cells to the left, so that n 
occurs as the last element in its row, on the diagonal. Next we move row 
82 into position 1 + f; + fo, and again the filled cells as far to the left 
as possible. In general, for 1 < i < r we move the row s; into position 
1+ f1+ fo+:-:-+f; and the filled cells as far left as possible. This clearly 
gives the desired set-up. The drawing to the left shows an example, with 
n = 7: the rows s1 = 2, sy = 3, 53 = 5and s4 = 7 with f; = fo = 2 
and f3 = f4 = 1 are moved into the rows numbered 2, 5, 6 and 7, and the 
columns are permuted “to the left’ so that in the end all entries except for 
the single 7 come to lie below the diagonal, which is marked by es. 


In order to be able to apply induction we now remove the entry n from 
the diagonal and ignore the first row and the last column (which do not 
not contain any filled cells): thus we are looking at a partial Latin square 
of order n — 1 with at most n — 2 filled cells, which by induction can be 
completed to a Latin square of order n — 1. The margin shows one (of 
many) completions of the partial Latin square that arises in our example. 
In the figure, the original entries are printed bold. They are already final, 
as are all the elements in shaded cells; some of the other entries will be 
changed in the following, in order to complete the Latin square of order n. 
In the next step we want to move the diagonal elements of the square to 
the last column and put entries n onto the diagonal in their place. How- 
ever, in general we cannot do this, since the diagonal elements need not 
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be distinct. Thus we proceed more carefully and perform successively, for 
k = 2,3,...,n —1 (in this order), the following operation: 

Put the value n into the cell (k,n). This yields a correct partial Latin 
square. Now exchange the value x, in the diagonal cell (k,k) with the 
value n in the cell (k,n) in the last column. 

If the value x;, did not already occur in the last column, then our job for the 
current & is completed. After this, the current elements in the k-th column 
will not be changed any more. 


In our example this works without problems for k = 2, 3 and 4, and the 
corresponding diagonal elements 3, 1 and 6 move to the last column. The 
following three figures show the corresponding operations. 


Now we have to treat the case in which there is already an element x, in 
the last column. In this case we proceed as follows: 

If there is already an element x, in a cell (j,n) with 2 < j < k, then we 
exchange in row j the element x, in the n-th column with the element x}, 
in the k-th column. If the element x}, also occurs in a cell (j',n), then we 
also exchange the elements in the j'-th row that occur in the n-th and in the 
k-th columns, and so on. 


If we proceed like this there will never be two equal entries in a row. Our 
exchange process ensures that there also will never be two equal elements in 


a column. So we only have to verify that the exchange process between the vi, trl j 
k-th and the n-th column does not lead to an infinite loop. This can be seen 
from the following bipartite graph G;,: Its vertices correspond to the cells 
(i,k) and (j,n) with 2 < i,7 < k whose elements might be exchanged. wl! w',| 9 
There is an edge between (i, k) and (j, n) if these two cells lie in the same 
row (that is, fori = 7), or if the cells before the exchange process contain Uk n [k 


the same element (which implies i 4 7). In our sketch the edges for i = 7 

are dotted, the others are not. All vertices in G;, have degree 1 or 2. The 

cell (k,n) corresponds to a vertex of degree 1; this vertex is the beginning 

of a path which leads to column k on a horizontal edge, then possibly on a 

sloped edge back to column n, then horizontally back to column é; and so Gr: 
on. It ends in column & at a value that does not occur in column n. Thus the 

exchange operations will end at some point with a step where we move a 

new element into the last column. Then the work on column k is completed, 

and the elements in the cells (7, k) for i > 2 are fixed for good. 
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In our example the “exchange case” happens for k = 5: the element x5 = 3 
does already occur in the last column, so that entry has to be moved back 
to column k = 5. But the exchange element rs = 6 is not new either, it is 
exchanged by x4 = 5, and this one is new. 


Finally, the exchange for k = 6 = n — 1 poses no problem, and after that 
the completion of the Latin square is unique: 


...and the same occurs in general: We put an element 7 into the cell (n,n), 
and after that the first row can be completed by the missing elements of the 
respective columns (see Lemma 1), and this completes the proof. In order 
to get explicitly a completion of the original partial Latin square of order n, 
we only have to reverse the element, row and column permutations of the 


first two steps of the proof. 
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Permanents and 
the power of entropy 


In Chapter 24 we discussed Van der Waerden’s conjecture, which estab- 
lished a lower bound for the permanent of a doubly stochastic matrix. There 
is also a wonderful theorem that gives an upper bound for integral matrices 
with prescribed row sums. 

Consider an n x n matrix M = (m,;) with 0/1-entries. To M/ we associate 
a simple bipartite graph Gyy = (U U V, E), whose vertices are given by 
U = {uy,...,Un} and V = {v1,..., Un}, and where 


Uz; -E EB = Miz = Its 


Conversely, every bipartite graph G on n + n nodes gives rise to a square 
0/1-matrix M of size n x n with G = Gz. Now look at the permanent 


perM = > Mia(1) °°" Mna(n)- 


Every term 71,5(1)M29(2)***™no(n) equals 0 or 1, and it is equal to 1 if 
and only if the set of edges {u1v9(1),-.+;UnVo(n)} is a perfect matching 
of Gy, that is, a set of edges that covers each vertex exactly once. Hence 
the number m(Gy,) of perfect matchings in Gy, is just the permanent, that 
is, per M = m(Gy). 

The correspondence G «—> Mg stimulated a lot of the early research on 
permanents. One of the first difficult problems was a conjecture posed by 
Henryk Minc in 1967: Suppose the 0/1-matrix M/ has row sums d),..., dn 
(or equivalently the vertices u1,..., Up, have degrees d),...,d,,), then 


perrM < EC wae 
i=l 


Observe that we can have equality, as seen from the example in the margin. 


Minc’s conjecture was proved by Lev M. Brégman in 1973. A few years 
later Alexander Schrijver gave a short and sweet proof, with a randomized 
version appearing in the book of Alon and Spencer. But in our view the 
proof straight from the BOOK is due to Jaikumar Radhakrishnan. It is 
not much different, but it uses just the right tool — entropy from informa- 
tion theory. Before we come to this, let us state Brégman’s theorem again. 
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The all 1’s matrix J, corresponds to 
the complete bipartite graph K;,,n, with 
per (Jn) = m(Kn,n) = nl. 


If k& divides n, the block diagonal matrix 


Jr 
mM=[ 
Jr 


with t blocks has d; =---=dn=k 
and per M = (k!)"/*. 
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A Mathematical Theory of Communication 
By C. B. SHANNON 
Intropocnion 

HE recent development of various methods of moxulation such as ICM 

and PPM which exchange bandwidth for signal-to-noise ratio has in- 
tensified the interest in a general theory of communication, A basis for 
such a theory is contained in the important papers of Nyquist! and Hartley? 
on this subject. In the present paper we will ¢ he theory to include a 
number of new factors, in particular the effect 0 in the channel, and 
the savings possible due to the statistical stractu: 
and due to the nature of the final destination of the 
‘The fundamental pro 
one point either exactl 
point. Frequently the 
correlated according to some system with certain physical or conceptual 
entities. These semantic aspects of communication are irrelevant to the 
engineering problem. ‘The significant aspect is that the act 
owe selected from a set of possible messages. ‘The system mu ned 


lormation, 
pmmunication is that of reproducing at 
nimately a message selected at another 


ave meaning; that is they refer to or are 


to operate for each possible selection, pot just the one whic 


be chosen since this is unknown at the time of design. 
If the number of messages in the set is finite then this number or any 
monotonic function of this number can be regart 


formation produced when one message is chose 
doing equally likely. As was neinted out by Hartley the 
Joanrith Ahem hal itinerant 


mt natural 


It is said that Shannon, following the 
advice of John von Neumann, used the 
name “entropy” because nobody knew 


exactly what this meant anyway ... 


H(Xp,1-p) 
1 


Theorem 1. Let M = (mj,;) be an n x n matrix with en- 
tries in {0,1}, and let d,,...,dy, be the row sums of M, that is, 
i en mj;. Then 


perM < Wei 


=i 


It does not happen often that a single paper gives birth to a whole field. 
Claude Shannon’s A Mathematical Theory of Communication from 1948 
was such a singular achievement: It laid the foundations of information 
theory and coding, and thereby initiated one of the great mathematical 
success stories of the twentieth century. 

Suppose X is a random variable taking values in {a1,..., an} with prob- 
abilities Prob(X = a;) = p;. It helps to think of X as an experiment 
with possible outcomes a;, like throwing a die with outcomes 1, 2,...,6. 
How much information do we receive (on the average) from performing the 
experiment? Shannon’s ingenious idea was the “equation” 


information after = uncertainty before. 


For example, when a coin is rigged and heads comes up most of the time, 
then there is little information to be gained from throwing it, certainly less 
than when the coin is fair, in which case the uncertainty (and information) 
is largest. 

By postulating certain natural conditions that an uncertainty measure for _X 
should satisfy, Shannon arrived at his famous definition of entropy, which 
he denoted by H(X): 


A(X) = A(Xp,,...,.pn) = —S— pj logy pj. 
i=1 


For example, if X is a throw of a biased coin with Prob(X = heads) = p, 
then the Shannon formula yields the function H(X,1—-,) = —plog,p 
—(1 — p) log,(1 — p) graphed in the margin. 

In the following we always use the binary logarithm log, p with the con- 
vention plogs p = 0 for p = 0. The support of the random variable X is 
supp X := {a: Prob(X =a) > 0}. 

Later in his paper Shannon gave an alternative interpretation of H(X) as 
the expected length of an optimal question strategy for the outcome of X. 
The appendix to this chapter contains a sketch of this approach. 


Suppose X and Y are two random variables with value ranges {a1,..., @m} 
and {b,...,b,}. A key ingredient for Radhakrishnan’s proof is the con- 
cept of conditional entropy of Y under knowledge of X. To shorten the 
writing, let us set p(a;) := Prob(X = a;), p(b;) := Prob(Y = b,), and 
similarly p(a;,b;) := Prob(X = a; AY = 6,) for the joint distribution 
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of the pair (X,Y), which may be viewed as a single random variable, and 
p(b; |a;) = Prob(Y = 6; |X = a;) for the conditional probabilities. Let 


A(Y|a;) = — S- (bj | a;) logs p(b; | ai) 


j=l 


be the entropy (uncertainty) of Y if we know that the outcome of X is aj. 
Now we take the expected value of this quantity over all possible outcomes 
of X and thus arrive at 


m 


H(Y|X) = SJ p(a)H(Y |ai) 


i=l 


as the conditional entropy of Y under knowledge of X. 
All we need for the proof of Brégman’s theorem are three facts about 
entropy, whose (easy) proofs are given in the appendix; the rest is clever 
and beautiful probabilistic reasoning. Here are the facts: 


(A) H(X) < logs(\supp X|), with equality if and only if X is uniformly 
distributed on the support of X, that is, Prob(X = a) = + for 
a € supp X, where n = |supp X|. 


(B) H(X,Y) = H(X)+H(Y |X), andmore generally H(X1,...,Xn) = 
H(X1) + H(X2|X1) +--+ + W(Xn |X, Xn-1). 


(C) Jf supp X is partitioned into the d sets Ey,...,Eqa, where Ej; = 
{a € supp X : |supp (Y | a)| = 7}, then 


d 
H(Y|X) < }°Prob(X € Ej) log, j. 


j=l 


@ Proof of Theorem 1. Let G = (U U V,E) be the bipartite graph 
associated with M/, where d; is the degree of the vertex u;, and denote 
by G the set of perfect matchings of G. As per M = m(G) = |G], we will 
prove the upper bound of the theorem for the number of perfect matchings 
of G. We may assume G # & because otherwise there is nothing to show. 
We view each o € G as the corresponding permutation o(1)a(2)...a(n) 
of the indices. Hence the vertex u; € U is matched to v,(;) € V under o. 


The first idea is to pick 0 € G uniformly at random and to consider the 
vector of random variables X = (X1,...,Xn) = (a(1),...,a(n)). 
By (A), 

H(o(1),.--,0(n)) = loga(|S}); 


hence it suffices to show that 


H(a(1),-.+,0(n)) $ logs (T](ail)/*) = > Floss (ait). (1) 


In particular, H(Y |X) = 0 if and 
only if the outcome of Y is determined 
once the result of X is known. 
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U1 UL 
U2 V2 
U3 U3 
UWA V4 


6 = {1243, 2143, 4132, 4231} 


Next we use (B) to get 


H(o(1),...,0(n)) = > H(o(i)|o(1),...,0(@—1)). (2) 
i=1 

Let’s find out what the conditional entropy H(o(7)|o(1),...,0(¢ — 1)) 
means. It measures the uncertainty about the matching mate of u; after 
the mates of u;,...,u;—-1 have been revealed. In particular, the support 
of the random variable o(i) under knowledge of (o(1),...,0(¢ — 1)) is 
contained in the set of indices of the neighbors of wu; that have not already 
been matched to one of u1,..., UWj_1. 


For example, let us check the formula in (B) for the graph in the margin, 
which has |G| = 4. Since all permutations in G are equally likely, we have 
H(a(1),...,0(4)) = log, 4 = 2. Now, H(o(1)) = —} log, + — $ log, F 
—t log, 4 = 3. Let us compute the conditional entropy H(o(2)|o(1)): 
For o(1) = 1 we get H(o(2)|1) = 0 since o(2) = 2 is then determined; 
similarly H(o(2)|2) = 0, but for o(1) = 4 we have H(o(2)|4) = 1, since 
there are two equally likely outcomes (2) = 1, 0(2) = 2. For the expected 
value we thus compute H(o(2)|o(1)) = 4-1 = $. The next conditional 
entropies H(a(3)|o(1),0(2)) and H(a(4)|o(1), 0(2), 0(3)) are both 0, 
since the values are determined. So summing up we again get H(o(1)) + 
H(o(2)|o(1)) + H(o(3)|o(1), 0(2)) + H(o(4) |o(1),0(2),0(3)) = £4 
4 +0-+0 = 2, in accordance with (B). 


I= 


Radhakrishnan’s wonderful idea was to examine the vertices u,,..., Wy, In 
a random order T, where all 7 are equally likely with probability 4, and 
then to take the average over the entropies. In other words, we reveal the 
matching mates in the order o(7(1)), o(7(2)),...,0(7(m)). Let us look at 
a fixed r. If k; = 7~+(), that is, if in the ordering 7 the vertex u; appears 


in k;th place, then equation (2) becomes 
H(o(1),...,0(n)) = ‘> H(o(i) | a(r(1)),...,0(7 (ki — 1))). 
i=1 


As this holds for all 7, taking the average we get 


Let us fix 7 and look at a summand 
H(o(i)|o(r(1)),---,0(7(ki — 1)))- (3) 


To upper bound (3) we use fact (C) from above, applied to the random vari- 
ables X = (o(r(1)),...,0(7(ki — 1))) and Y = o(i). For each o let 
N;,(o,T) be the set of indices of the neighbors of wu; that are not among 
{o(r(1)),...,0(7(k; — 1))}. Since u; has d; neighbors and o is a perfect 
matching we have 1 < |Ni(o,7)| < d; for all o. Now partition supp X 
into the sets E\"), where (o(r(1)),--.,0(7 (kj — 1))) lies in EY) if and 


aj? 
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only if |Ni(o,7)| = j, for 1 < 7 < d;. Considering |Nj;(o,7)| as a random 
variable on G, we thus have 


Prob(X € Ei’)) = Prob(|Nj(o,7)| = 3), 
and fact (C) tells us that for fixed 7 


dj 
H(o(i) | o(r(1)),...,0(7(ki—1))) < S | Prob(|Ni(o, T)| = 7) logs j. 


j= 


Hence we get altogether 


n dj 

1 : : 

H(a(1),..-,0(n)) < Fd logs 5 >| Prob(|Ni(a,7)| =j). 4 
i=1 j=1 T 

This seems to get more complicated as we go along — but wait! Looking 


at (1) it suffices to show that the innermost sum in (4) equals n! Lt for all 7, 
because then the right-hand side simplifies to )>;"_, rt logs (d;!). 

And this assertion about the inner sum is easy! Fix o, and let @1,..., 2a; be 
the indices of the neighbors of uj, Dz = {o~1(€1),...,071(a,)} is the 
set of indices of the U-vertices that are matched onto the neighbors of w;, 
including of course 2 itself, and they appear according to the ordering of D, 
under 7. If 2 comes first in D,, then no neighbors had been taken so far, 
whence |.V;(o,7)| = d;. If 7 is second, then one neighbor is gone, thus 
|Ni(o,7)| = d; — 1, and so on. 

Now the power of averaging comes into play. With 7 running through 
all n! permutations, all possible orderings of the list D, occur with equal 
frequency, which means that 2 appears in all d; places of D, with the same 
frequency oe But this, in turn, implies that |N;(o,7)| = 7 occurs with 


frequency a for all 7, and this holds for all o, whence 


S| Prob(|Ni(o,7)| =f) = Z 


T 


for all 7, and we are done. 


We cannot end this chapter without deriving a stunning asymptotic formula 
for the number L(n) of Latin squares of order n. (See Chapter 36 for the 
definition of Latin squares.) The small examples 


L(1) =1, L(2) =2, L(3) = 12, L(4) =576, L(5) = 161280 


suggest that L(n) grows exceedingly fast. So, all we can hope for are good 
bounds — and these are miraculously supplied by Brégman’s Theorem and 
by the permanent theorem discussed in Chapter 24. 

Take an empty n xn square and fill it row by row with the numbers 1,...,7, 
so that the resulting configuration is a Latin square. There are n! ways to 
fill the first row, since we may take every permutation. Suppose the first 


i— 


1: 


3 
2 
3 
1 


n=2: 


There are 3!2! = 12 fill- 
ings of the first row and 
the first column; the rest 
is then determined. 
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3 1 1 
1 4 2 2 
3 3 
4 4 
R G2 
4 3 1 1 
1 2 2 
<< 
2 3 3 
4 4 
Gy 
1 00 1 
0101 
Me I G4 aw 
1 0 1 0 
per Mz = 2. 


Note that per AM = A" per M for an 
n X n matrix M. 


The case k = n corresponds to the 
n! fillings of the first row. 


n — k rows are properly filled to give an (n — k) x n Latin rectangle R. In 
how many ways can we fill the next row? Consider the following bipartite 
graph G; = (U UV, E), where U is the set of elements and V the set of 
column positions, with 


ij € EF :<= > 1 does not appear in the 7th column of R. 


So, exactly the numbers that are joined to 7 can be used in column 7 of the 
(n — k + 1)st row. In other words, a proper filling of the next row corre- 
sponds to a perfect matching of G;. Now, every element 7 € U appears 
n — k times in R, hence it is available in k columns for the next row. Thus 
i has degree k in G, and similarly d(j) = k for 7 € V. (We used this 
argument already in the proof of Lemma | in Chapter 36.) 


Let M;, be the 0/1-matrix corresponding to G;,, thus 
per M;, = the number of proper fillings of row n — k + 1. 


Every row and column in M;, sums to k; let us denote the set of 0/1- 
matrices with this property by M(n,). The permanent per /;, depends, 
of course, on the setup of R, but if we have general lower and upper bounds 
for matrices in M(n, k), then by taking the product over all &, we obtain 
lower/upper bounds for L(n). 


By Brégman’s Theorem with dj = dz =--- =d, = k we get right away 
perrM < kle for all M € M(n,k). 

Now to the lower bound: If M is in. M(n, k), then iM is doubly stochastic, 

which implies by the permanent theorem in Chapter 24 that 


1 | 
per M = k”per (=u) = pre 
k nr 


In summary, we have proved the following remarkable bounds. 


Theorem 2. The number L(n) of Latin squares of order n is 
bounded by 


nl2” 


—, < L(n) < [[Rir’*. 


Using the approximations for n! from page 13 


(=)" <nt< en(*)", (5) 


we can easily derive from this the following astonishingly simple asymp- 
totic formula. 


Corollary. In the limit, the number L(n) of Latin squares of order n 
satisfies 
L 1/n? 1 
lim (n) = >: 
N—- oo nm e€ 
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@ Proof. For the lower bound we get 


nlan ()20" 


so 
Kg 1/n? 1 L 1/n? 1 
(n) > s and thus lim (n) a 
n e€ n—¥00 n e 
The upper bound needs a little more work. We will show that for any ¢ > 0 


L(n)V" rae. 


n e2 


(l+e) 


holds when n is large enough. For convenience we set £(n) = L(n) 
Using (5) for k in place of n, we have 


1/n? 


1s rol 
log£L(n) < — log [J (A!) = — LZ eek! 
k=1 k=1 


ZEAE) 


A 


n 


Il 
7 — DL Gli t log k + klog k — k) 
k=1 
= 13 SD EE Yogi — nf. (6) 
n mai * k=1 k k=1 


Now, look at page 13. The first sum is the harmonic number H,,, where 
A, < logn +1. The third sum was also treated there, with 
SS log k < (n+ 1)log(n+ 1) —n < (n+ 2)logn—n, 
k=1 
where the second inequality is derived in the margin, for n > 6. For the The inequality (n + 1)"t? < n+? 


second sum in (6) the same integration method that we had used for the holds for n > 6: It may be rewritten as 


log x 


third one yields, using that —=- is positive for x > 1 and monotonically 17 1 
decreasing for x > e, that (1 + —) (1 + —) Sn, 
“\ log k " log x n Lyn 1 
> g < / g —_ [3 (log x)’| = 1 (log n)?. where (+a) <eand1+= < 2; thus 
k 1 a 1 the left side is less than 2e < 6 < n. 


k=4 
Thus the second sum in (6) is smaller than 2 + 3 (log n)?. 
Putting everything together, we get 


l l 2 
gen 2 ED) + logn — 2. 
n n 2n 


log L(n) 


The first three terms go to 0 as n gets large, and we conclude that for every 
d>0 

log L(n) < 6+ logn — 2 
will hold if n is large enough. Thus we get L(n)1/ wie Se? for all large 
enough n, and this is what we wanted to prove. 
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The actual minimum L(X) can e.g. be 
computed by Huffman’s algorithm, a 
classic in computer science. 


Remember 0 - log, 0 = 0. 


Appendix: More about entropy 


What was Shannon’s alternative approach to entropy? 

As before, let X be a random variable with value set {a,,...,a,} and 
pi, = Prob(X = a;). We employ a certain strategy S of yes/no questions 
until we know the value of X for sure. If our strategy leads us to ask ¢; 
questions in the case of the outcome X = aj, then L(S) = ys pil; 18 
the expected number of questions. Of course, a good strategy will want to 
ask few questions for very likely outcomes a; (when p; is large), so as to 
minimize the average number. 

As an example, suppose that the probabilities for throwing a loaded die are 
Pp $s p2 = ps = pa é> and ps = pe = z. A strategy might be 
the following. First question: “Is the outcome < 3?” If yes, which happens 
with probability iG ask the second question: “Is it 1?” If yes again, we are 
done, otherwise we need one more question to decide whether the throw 
shows 2 or 3. Proceeding in analogous fashion if the first answer was no, 
we get () = 2, lp = 3 = 3, (4 = 2, C5 = le = 3, thus 


Tis) = 2448) +8 +4848) =3. 


Shannon now proved that the entropy H(X) = —)°"_, pj log, p; is a 
lower bound for the expected number of questions L(S) = 30", piéi 
for every conceivable strategy S. Let us check this! First we have that 
1 27>" = 1 (why?), and the inequality log, xz < «—1forz > 0 
together with )>)"_, p; = 1 yields 


n i n —h; n n 
Yo pilog,—— < Yon (=--1) = Y2e*-Yw = 0. 
i=l i=l i=1 i=l 


Pp 
But this means that — 7""_, pili < 77, pi logs pi, or L(S) > H(X). 


—£ 


2 
Pi 


Conversely, it is easy to find a strategy Sp with L(So) < H(X) +1, hence 
AX) = LX) = min L(S) < H(X)+1. 


Looking at n-fold repetitions X” of the experiment X, Shannon went on to 
show that the expected number of questions per experiment 1E(X ”) used 
by optimal strategies for X” converges to H(X) for n — oo. (Shannon 
called this the “Fundamental theorem for a noiseless channel.”’) 


Now to the three facts that we used in the proof of Theorem 1. 
(A) H(X) < log» (|supp X’). 


@ Proof. Assume without loss of generality that p; > 0 for all 7. Consider 
the general form of the AM-GM inequality ay hess ahm < piai+-+++DPndn 
on page 144. Set a; = = and take the logarithm to obtain 


” 1 ee 
y pi logs ~ < logy ( y pi—) = logyn. 
i=1 Pi in 
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Equality holds if and only if py = --- = pp = 4, that is, if we have uniform 
distribution. 


(B) A(X, Y) = A(X)+ A(Y |X). 
@ Proof. We use the same notation as before and compute 


(X,Y) = ~ LP (a;,b j) logy p(ai, b;) 


I 


“Sx (a;, bj) logs (p(ai)p(b; | aa)) 


I 


Yves ) logy p(ai) ~ 2 Plalp (0; |@i) logs p(d; | ai) 


= —) p(a;) log, p(ai) + H(Y |X) = H(X) + H(Y |X). 


i=l 


The general formula follows by induction. 


d 
(C) H(Y |X) < )> Prob(X € E;) log, j. 
j=l 


M@ Proof. We have H(Y |X) = 3>\", p(ai)H(Y |a;). Partitioning the set 


{d1,.--,@m} into the subsets E; given by the assumption and using (A) 
we get 
d 
H(Y|X) = S00 p@)H( a) 
j=1 ack; 
d d 
< DID pla)log, § = }) Prob(X € Bj) logs j. 
j=1acE; j=l 


References 


[1] N. ALON & J. SPENCER: The Probabilistic Method, Third edition, Wiley- 
Interscience 2008. 


[2] L. BREGMAN: Some properties of nonnegative matrices and their permanents, 
Soviet Math. Doklady 14 (1973), 945-949. 


[3] A. KHINCHIN: Mathematical Foundations of Information Theory, Dover Pub- 
lications 1957. 


[4] B. D. McKay & I. M. WANLESs: On the number of Latin squares, Annals 
of Combinatorics 9 (2005), 335-344. 


[5] J. RADHAKRISHNAN: An entropy proof of Bregman’s theorem, J. Combinato- 
rial Theory, Ser. A 77 (1997), 161-164. 


[6] A. SCHRIJVER: A short proof of Minc’s conjecture, J. Combinatorial Theory, 
Ser. A 25 (1978), 80-83. 


270 Permanents and the power of entropy 


[7] H. MINC: Permanents, Encyclopedia of Mathematics and its Applications, 
Vol. 6, Addison-Wesley, Reading MA 1978; reissued by Cambridge Univer- 
sity Press 1984. 


[8] C. SHANNON: A Mathematical Theory of Communication, Bell System Tech- 
nical Journal 27 (1948), 379-423, 623-656. 


“Do you get any news?” 
“Sure! —S°, pi logy pi of them!” 


The Dinitz problem Chapter 38 
® 


Check for 
updates 


The four-color problem was a main driving force for the development of 
graph theory as we know it today, and coloring is still a topic that many 
graph theorists like best. Here is a simple-sounding coloring problem, 
raised by Jeff Dinitz in 1978, which defied all attacks until its astonishingly 


simple solution by Fred Galvin fifteen years later. i 
Cli, J) 
Consider n? cells arranged in an n x n square, and let (i, j) denote 
the cell in row i and column j. Suppose that for every cell (i,j) we 
are given a set C(i, 7) of n colors. ea 
a 


Is it then always possible to color the whole array by picking for 
each cell (i,j) a color from its set C(i,j) such that the colors in 
each row and each column are distinct? 


As a start consider the case when all color sets C'(i, 7) are the same, say 
{1,2,...,n}. Then the Dinitz problem reduces to the following task: Fill 
the n xn square with the numbers 1, 2,..., in such a way that the numbers 
in any row and column are distinct. In other words, any such coloring 
corresponds to a Latin square, as discussed in the previous chapter. So, in 
this case, the answer to our question is “yes.” 

Since this is so easy, why should it be so much harder in the general case 
when the set C' := U; ; C(i,7) contains even more than n colors? The 
difficulty derives from the fact that not every color of Cis available at each 
cell. For example, whereas in the Latin square case we can clearly choose 
an arbitrary permutation of the colors for the first row, this is not so anymore 
in the general problem. Already the case n = 2 illustrates this difficulty. 
Suppose we are given the color sets that are indicated in the figure. If we 
choose the colors | and 2 for the first row, then we are in trouble since we 
would then have to pick color 3 for both cells in the second row. 


Before we tackle the Dinitz problem, let us rephrase the situation in the 
language of graph theory. As usual we only consider graphs G = (V, E) 
without loops and multiple edges. Let y(G) denote the chromatic number 
of the graph, that is, the smallest number of colors that one can assign to 
the vertices such that adjacent vertices receive different colors. 

In other words, a coloring calls for a partition of V into classes (colored 
with the same color) such that there are no edges within a class. Calling 
a set A C V independent if there are no edges within A, we infer that 
the chromatic number is the smallest number of independent sets which 
partition the vertex set V. 


© Springer-Verlag GmbH Germany, part of Springer Nature 2018 
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EH 


The graph 53 


{1,3} 
{1,2} 

{1,4} 
{3,4} {2,3} 

{2,4} 


In 1976 Vizing, and three years later Erdés, Rubin, and Taylor, studied the 
following coloring variant which leads us straight to the Dinitz problem. 
Suppose in the graph G = (V, FE) we are given a set C(v) of colors for 
each vertex v. A list coloring is a coloring c : V — U,cy C(v) where 
c(v) € C(v) for each v € V. The definition of the list chromatic number 
x (G) should now be clear: It is the smallest number k such for any list 
of color sets C(v) with |C(v)| = k for all v € V there always exists a list 
coloring. Of course, we have y,(G) < |V| (we never run out of colors). 
Since ordinary coloring is just the special case of list coloring when all sets 
Cv) are equal, we obtain for any graph G 


x(G) SoG): 


To get back to the Dinitz problem, consider the graph S,, which has as 
vertex set the n? cells of our n x n array with two cells adjacent if and only 
if they are in the same row or column. 

Since any n cells in a row are pairwise adjacent we need at least n colors. 
Furthermore, any coloring with n colors corresponds to a Latin square, 
with the cells occupied by the same number forming a color class. Since 
Latin squares, as we have seen, exist, we infer y(S,,) = n, and the Dinitz 
problem can now be succinctly stated as 


X~(Sn) = n? 


One might think that perhaps x(G’) = x,(G) holds for any graph G, but 
this is a long shot from the truth. Consider the graph G = Koy 4. The 
chromatic number is 2 since we may use one color for the two left vertices 
and the second color for the vertices on the right. But now suppose that we 
are given the color sets indicated in the figure. 

To color the left vertices we have the four possibilities 1|3, 1|4, 2|3 and 2/4, 
but any one of these pairs appears as a color set on the right-hand side, so 
a list coloring is not possible. Hence x (G) > 3, and the reader may find 
it fun to prove x iG) = 3 (there is no need to try out all possibilities!). 
Generalizing this example, it is not hard to find graphs G where x(G) = 2, 
but x, (G) is arbitrarily large! So the list coloring problem is not as easy as 
it looks at first glance. 


Back to the Dinitz problem. A significant step towards the solution was 
made by Jeanette Janssen in 1992 when she proved x i (S,) <n+1, and 
the coup de grdce was delivered by Fred Galvin by ingeniously combining 
two results, both of which had long been known. We are going to discuss 
these two results and show then how they imply y,(S;,) =n. 

First we fix some notation. Suppose v is a vertex of the graph G, then we 
denote as before by d(v) the degree of v. In our square graph S,, every 
vertex has degree 2n — 2, accounting for the n — 1 other vertices in the 
same row and in the same column. For a subset A C V we denote by G4 
the subgraph which has A as vertex set and which contains all edges of G 
between vertices of A. We call G4 the subgraph induced by A, and say 
that H is an induced subgraph of G if H = G4 for some A. 
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To state our first result we need directed graphs G = (V, E), that is, graphs 
where every edge e has an orientation. The notation e = (u,v) means that 
there is an arc e, also denoted by uv, whose initial vertex is wu and whose 
terminal vertex is v. It then makes sense to speak of the outdegree d*(v) 
resp. the indegree d~ (v), where d* (v) counts the number of edges with v as 
initial vertex, and similarly for d~ (v); furthermore, d+ (v)+d~(v) = d(v). 
When we write G, we mean the graph G without the orientations. 

The following concept originated in the analysis of games and will play a 
crucial role in our discussion. 


Definition 1. Let G = (V, E) be a directed graph. A kernel K C Visa 
subset of the vertices such that 


(i) K is independent in G,, and 
(ii) for every u ¢ K there exists a vertex v € K with an edge u — v. 


Let us look at the example in the figure. The set {b,c, f} constitutes a 
kernel, but the subgraph induced by {a,c, e} does not have a kernel since 
the three edges cycle through the vertices. 


With all these preparations we are ready to state the first result. 


Lemma 1. Let G = (V, F) be a directed graph, and suppose that for each 
vertex v € V we have a color set C(v) that is larger than the outdegree, 
|C(v)| > dt+(v) + 1. If every induced subgraph of G possesses a kernel, 
then there exists a list coloring of G with a color from Cv) for each v. 


@ Proof. We proceed by induction on |V|. For |V| = 1 there is nothing to 


prove. Choose a color c € C = U,,cy C(v) and set 


A(c) = {vEV:cEC(v)}. 


By hypothesis, the induced subgraph G 4.) possesses a kernel /’(c). Now 
we color all v € K(c) with the color c (this is possible since A(c) is 
independent), and delete K (c) from G and c from C. Let G’ be the induced 
subgraph of G on V\K(c) with C’(v) = C(v)\c as the new list of color 
sets. Notice that for each v € A(c)\K(c), the outdegree dt (v) is decreased 
by at least 1 (due to condition (ii) of a kernel). So dt (v) +1 < |C’(v)| still 
holds in G’. The same condition also holds for the vertices outside A(c), 
since in this case the color sets C'(v) remain unchanged. The new graph G’ 
contains fewer vertices than G, and we are done by induction. 


The method of attack for the Dinitz problem is now obvious: We have to 
find an orientation of the graph S,, with outdegrees dt (v) < n—1 for all v 
and which ensures the existence of a kernel for all induced subgraphs. This 
is accomplished by our second result. 

Again we need a few preparations. Recall (from Chapter 11) that a bipartite 
graph G = (X UY, £) is a graph with the following property: The vertex 
set V is split into two parts X and Y such that every edge has one endvertex 
in X and the other in Y. In other words, the bipartite graphs are precisely 
those which can be colored with two colors (one for _X and one for Y). 
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A bipartite graph with a matching 


The bold edges constitute a stable 
matching. In each priority list, the 
choice leading to a stable matching is 
printed bold. 


Now we come to an important concept, “stable matchings,” with a down- 
to-earth interpretation. A matching M ina bipartite graph G = (X UY, FE) 
is a set of edges such that no two edges in IZ have a common endvertex. In 
the displayed graph the edges drawn in bold lines constitute a matching. 
Consider X to be a set of men and Y a set of women and interpret wv € FE 
to mean that u and v might marry. A matching is then a mass-wedding with 
no person committing bigamy. For our purposes we need a more refined 
(and more realistic?) version of a matching, suggested by David Gale and 
Lloyd S. Shapley. Clearly, in real life every person has preferences, and 
this is what we add to the set-up. In G = (X UY, E) we assume that for 
every v © X UY there is a ranking of the set N(v) of vertices adjacent 
tov, N(v) = {21 > 22 > +++ > Zaryy}. Thus z; is the top choice for v, 
followed by z2, and so on. 


Definition 2. A matching M of G = (X UY, E) is called stable if the 
following condition holds: Whenever uv € E\M,u € X,v € Y, then 
either uy € M with y > vin N(u) or av € M with x > win N(v), 
or both. 


In our real life interpretation a set of marriages is stable if it never happens 
that u and v are not married but u prefers v to his partner (if he has one at 
all) and v prefers u to her mate (if she has one at all), which would clearly 
be an unstable situation. 


Before proving our second result let us take a look at the following example: 


{A>C} a A {e>d>a} 
{C>D>B} b B {pb} 
{A>D} c¢ C {a>b} 
{A} d D_ {ce>b} 


Notice that in this example there is a unique largest matching M with four 
edges, M = {aC,bB,cD, dA}, but M is not stable (consider cA). 


Lemma 2. A stable matching always exists. 


@ Proof. Consider the following algorithm. In the first stage all men 
u € X propose to their top choice. If a girl receives more than one pro- 
posal she picks the one she likes best and keeps him on a string, and if she 
receives just one proposal she keeps that one on a string. The remaining 
men are rejected and form the reservoir FR. In the second stage all men in R 
propose to their next choice. The women compare the proposals (together 
with the one on the string, if there is one), pick their favorite and put him 
on the string. The rest is rejected and forms the new set R. Now the men 
in R propose to their next choice, and so on. A man who has proposed to 
his last choice and is again rejected drops out from further consideration 
(as well as from the reservoir). Clearly, after some time the reservoir RF is 
empty, and at this point the algorithm stops. 
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Claim. When the algorithm stops, then the men on the strings 
together with the corresponding girls form a stable matching. 


Notice first that the men on the string of a particular girl move there in 
increasing preference (of the girl) since at each stage the girl compares 
the new proposals with the present mate and then picks the new favorite. 
Hence if uv € E but uv ¢ M, then either wu never proposed to v in 
which case he found a better mate before he even got around to v, im- 
plying uy € M with y > v in N(w), or u proposed to v but was rejected, 
implying av € M with « > win N(v). But this is exactly the condition of 
a stable matching. 


Putting Lemmas | and 2 together, we now get Galvin’s solution of the 
Dinitz problem. 


Theorem. We have x, (Sn) =n forall n. 


H Proof. As before we denote the vertices of S,, by (i,j), 1 <i,j <n. 
Thus (7,7) and (r,s) are adjacent if and only if? = r or 7 = s. Take 
any Latin square L with letters from {1,2,...,} and denote by L(i, 7) 
the entry in cell (i, 7). Next make S,, into a directed graph S. by orienting 
the horizontal edges (i, 7) —> (i, 7’) if L(t, 7) < L(i,7’) and the vertical 
edges (7,7) —> (i,j) if L(i,j) > L(w’,7). Thus, horizontally we orient 
from the smaller to the larger element, and vertically the other way around. 
(In the margin we have an example for n = 3.) 

Notice that we obtain d* (i, 7) =n — 1 for all (i, 7). In fact, if L(i, 7) =k, 
then n — k cells in row i contain an entry larger than k, and k — 1 cells in 
column 7 have an entry smaller than k. 

By Lemma | it remains to show that every induced subgraph of .. pos- 
sesses a kernel. Consider a subset A C V, and let X be the set of rows 
of L, and Y the set of its columns. Associate to A the bipartite graph 
G = (X UY, A), where every (i, 7) € A is represented by the edge ij with 
i © X,j € Y. In the example in the margin the cells of A are shaded. 

The orientation on S,, naturally induces a ranking on the neighborhoods in 
G = (X UY, A) by setting j’ > j in N(i) if (i, 7) — (i, 7’) in S;, respec- 
tively i’ > 7 in N(j) if (¢,7) —> (,j). By Lemma 2, G = (X UY, A) 
possesses a stable matching M. This /, viewed as a subset of A, is our 
desired kernel! To see why, note first that 1/ is independent in A since as 
edges in G = (X UY, A) they do not share an endvertex 7 or j. Secondly, 
if (7,7) € A\M, then by the definition of a stable matching there either 
exists (i, j’) € M with j’ > j or (i’,j) € M with i! > i, which for S, 
means (i,7) —> (i,7’) € M or (i,j) —> (W’,7) © M, and the proof 
is complete. 


To end the story let us go a little beyond. The reader may have noticed that 
the graph S,, arises from a bipartite graph by a simple construction. Take 
the complete bipartite graph, denoted by K,,,,, with |X| = |Y| =n, and 
all edges between X and Y. If we consider the edges of Ky,» as vertices 
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The Dinitz problem 


c 


Construction of a line graph 


of a new graph, joining two such vertices if and only if as edges in K,, , 
they have a common endvertex, then we clearly obtain the square graph S,,. 
Let us say that S,, is the line graph of K,,,,. Now this same construction 
can be performed on any graph G with the resulting graph called the line 
graph L(G) of G. 

In general, call H a line graph if H = L(G) for some graph G. Of course, 
not every graph is a line graph, an example being the graph K2 4 that we 
considered earlier, and for this graph we have seen x(K2,4) < X,(K2,4). 
But what if H is a line graph? By adapting the proof of our theorem it can 
easily be shown that x(H) = x,(H) holds whenever #7 is the line graph of 
a bipartite graph, and the method may well go some way in verifying the 
supreme conjecture in this field: 


Does x(H) = x,(H) hold for every line graph H? 


Very little is known about this conjecture, and things look hard — but after 
all, so did the Dinitz problem twenty years ago. 
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Five-coloring plane graphs 


Plane graphs and their colorings have been the subject of intensive research 
since the beginnings of graph theory because of their connection to the four- 
color problem. As stated originally the four-color problem asked whether it 
is always possible to color the regions of a plane map with four colors such 
that regions which share a common boundary (and not just a point) receive 
different colors. The figure on the right shows that coloring the regions of a 
map is really the same task as coloring the vertices of a plane graph. As in 
Chapter 13 (page 89) place a vertex in the interior of each region (including 
the outer region) and connect two such vertices belonging to neighboring 
regions by an edge through the common boundary. 

The resulting graph G, the dual graph of the map /, is then a plane graph, 
and coloring the vertices of G' in the usual sense is the same as coloring 
the regions of /. So we may as well concentrate on vertex-coloring plane 
graphs and will do so from now on. Note that we may assume that G' has 
no loops or multiple edges, since these are irrelevant for coloring. 


In the long and arduous history of attacks to prove the four-color theorem 
many attempts came close, but what finally succeeded in the Appel—Haken 
proof of 1976 and also in the more recent proof of Robertson, Sanders, 
Seymour and Thomas 1997 was a combination of very old ideas (dating 
back to the 19th century) and the very new calculating powers of modern- 
day computers. Twenty-five years after the original proof, the situation 
is still basically the same, there is even a computer-generated computer- 
checkable proof due to Gonthier, but no proof from The Book is in sight. 
So let us be more modest and ask whether there is a neat proof that every 
plane graph can be 5-colored. A proof of this five-color theorem had al- 
ready been given by Heawood at the turn of the century. The basic tool for 
his proof (and indeed also for the four-color theorem) was Euler’s formula 
(see Chapter 13). Clearly, when coloring a graph G' we may assume that G 
is connected since we may color the connected pieces separately. A plane 
graph divides the plane into a set R of regions (including the exterior re- 
gion). Euler’s formula states that for plane connected graphs G = (V, FE) 
we always have 


VJ —|E| +|R| = 2. 
As a warm-up, let us see how Euler’s formula may be applied to prove 
that every plane graph G is 6-colorable. We proceed by induction on the 


number n of vertices. For small values of n (in particular, for n < 6) this 
is obvious. 
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Check for 
updates 


The dual graph of a map 


This plane graph has 8 vertices, 
13 edges and 7 regions. 
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A near-triangulated plane graph 


From part (A) of the proposition on page 91 we know that G has a vertex v 
of degree at most 5. Delete v and all edges incident with v. The resulting 
graph G’ = G\v isa plane graph on n — 1 vertices. By induction, it can be 
6-colored. Since v has at most 5 neighbors in G, at most 5 colors are used 
for these neighbors in the coloring of G’. So we can extend any 6-coloring 
of G’ to a 6-coloring of G by assigning a color to v which is not used for 
any of its neighbors in the coloring of G’. Thus G is indeed 6-colorable. 


Now let us look at the list chromatic number of plane graphs, which we 
have discussed in the chapter on the Dinitz problem. Clearly, our 6-coloring 
method works for lists of colors as well (again we never run out of colors), 
so x,(G) < 6 holds for any plane graph G. Erd6és, Rubin and Taylor 
conjectured in 1979 that every plane graph has list chromatic number at 
most 5, and further that there are plane graphs G with y (G) > 4. They 
were right on both counts. Margit Voigt was the first to construct an ex- 
ample of a plane graph G with x, (G) = 5 (her example had 238 vertices) 
and around the same time Carsten Thomassen gave a truly stunning proof 
of the 5-list coloring conjecture. His proof is a telling example of what you 
can do when you find the right induction hypothesis. It does not use Euler’s 
formula at all! 


Theorem. All planar graphs G can be 5-list colored: 


x,(G) <5. 


@ Proof. First note that adding edges can only increase the chromatic num- 
ber. In other words, when H is a subgraph of G, then y,(H) < x,(G) 
certainly holds. Hence we may assume that G is connected and that all 
the bounded faces of an embedding have triangles as boundaries. Let us 
call such a graph near-triangulated. The validity of the theorem for near- 
triangulated graphs will establish the statement for all plane graphs. 

The trick of the proof is to show the following stronger statement (which 
allows us to use induction): 


Let G = (V,E) be a near-triangulated graph, and let B be the 
cycle bounding the outer region. We make the following assump- 
tions on the color sets C(v), v € V: 


(1) Two adjacent vertices x,y of B are already colored with 
(different) colors a and £3. 

(2) |C(v)| > 3 for all other vertices v of B. 

(3) |C(w)| > 5 for all vertices v in the interior. 


Then the coloring of x,y can be extended to a proper coloring of G 
by choosing colors from the lists. In particular, x, (G) <5. 
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For |V| = 3 this is obvious, since for the only uncolored vertex v we have 
|C'(v)| > 3, so there is a color available. Now we proceed by induction. 


Case 1: Suppose B has a chord, that is, an edge not in B that joins two 
vertices u,v € B. The subgraph G, which is bounded by B, U {uv} 
and contains x, y,u and v is near-triangulated and therefore has a 5-list 
coloring by induction. Suppose in this coloring the vertices u and v receive 
the colors y and 6. Now we look at the bottom part G2 bounded by By and 
uv. Regarding u,v as pre-colored, we see that the induction hypotheses 
are also satisfied for G2. Hence G2 can be 5-list colored with the available 
colors, and thus the same is true for G. 


Case 2: Suppose B has no chord. Let vo be the vertex on the other side of 
the a-colored vertex x on B, and let x, v1,..., vz, w be the neighbors of vp. 
Since G is near-triangulated we have the situation shown in the figure. 

Construct the near-triangulated graph G’ = G\vo by deleting from G the 
vertex vg and all edges emanating from vp. This G’ has as outer boundary 
B’ = (B\vo) U {v1,..., ve}. Since |C(vp)| > 3 by assumption (2) there 
exist two colors 7,6 in C(vo) different from a. Now we replace every 
color set C'(v;) by C(u;)\{7, 6}, keeping the original color sets for all other 
vertices in G’. Then G’ clearly satisfies all assumptions and is thus 5-list 
colorable by induction. Choosing y or 6 for vo, different from the color 
of w, we can extend the list coloring of G’ to all of G. 


So, the 5-list color theorem is proved, but the story is not quite over. A 
stronger conjecture claimed that the list-chromatic number of a plane graph 
G is at most 1 more than the ordinary chromatic number: 


Is x,(G) < x(G) +1 for every plane graph G ? 


Since x(G) < 4 by the four-color theorem, we have three cases: 


Case I: x(G)=2 = > x,(G) <3 
Case I: x(G)=3 => x,(G) <4 
Case ll: x(G)=4 => y,(G) <5. ipa 


Thomassen’s result settles Case III, and Case I was proved by an ingenious {a, 1,3, 4} 
(and much more sophisticated) argument by Alon and Tarsi. Furthermore, 
there are plane graphs G with y(G) = 2 and y,(G) = 3, for example the 
graph K‘2 4 that we considered in the chapter on the Dinitz problem. 


{a, 6,1, 


»2,3,4 
{1,2,3,4 


But what about Case II? Here the conjecture fails: This was first shown 
by Margit Voigt for a graph that was earlier constructed by Shai Gutner. 
His graph on 130 vertices can be obtained as follows. First we look at 
the “double octahedron” (see the figure), which is clearly 3-colorable. Let 
a € {5,6, 7,8} and G € {9, 10, 11, 12}, and consider the lists that are given {6 2, 3, 4} 43, 1,3,4} 
in the figure. You are invited to check that with these lists a coloring is not 
possible. Now take 16 copies of this graph, and identify all top vertices and 
all bottom vertices. This yields a graph on 16 - 8 + 2 = 130 vertices which B 
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is still plane and 3-colorable. We assign {5,6,7,8} to the top vertex and 
{9, 10,11, 12} to the bottom vertex, with the inner lists corresponding to 
the 16 pairs (a, 8), a € {5,6,7,8}, 6 € {9,10, 11,12}. For every choice 
of a and ( we thus obtain a subgraph as in the figure, and so a list coloring 
of the big graph is not possible. 

By modifying another one of Gutner’s examples, Voigt and Wirth came up 
with an even smaller plane graph with 75 vertices and x = 3, x, = 5, which 
in addition uses only the minimal number of 5 colors in the combined lists. 
The current record is 63 vertices — achieved in 1996 by a young Iranian 
Math Olympiad participant, Maryam Mirzakhani, who in 2014 became the 
first woman ever to receive a Fields Medal. 

To close let us remark that Victor Campos and Frédéric Havet have recently 
extended Thomassen’s theorem by showing that every graph that can be 
drawn in the plane with at most two crossings is still 5-list colorable. 
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How to guard a museum 


Here is an appealing problem which was raised by Victor Klee in 1973. 
Suppose the manager of a museum wants to make sure that at all times 
every point of the museum is watched by a guard. The guards are stationed 
at fixed posts, but they are able to turn around. How many guards are 
needed? 

We picture the walls of the museum as a polygon consisting of n sides. 
Of course, if the polygon is convex, then one guard is enough. In fact, the 
guard may be stationed at any point of the museum. But, in general, the 
walls of the museum may have the shape of any closed polygon. 

Consider a comb-shaped museum with n = 3m walls, as depicted on the 
right. It is easy to see that this requires at least m = 3 guards. In fact, 
there are n walls. Now notice that the point 1 can only be observed by a 
guard stationed in the shaded triangle containing 1, and similarly for the 
other points 2,3,...,m. Since all these triangles are disjoint we conclude 
that at least m guards are needed. But m guards are also enough, since they 
can be placed at the top lines of the triangles. By cutting off one or two 
walls at the end, we conclude that for any n there is an n-walled museum 
which requires | 5 | guards. 
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A real life art gallery... 
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A museum with n = 12 walls 


A triangulation of the museum 


C’ 


B' 
A’ 


i. 


B 


Schénhardt’s polyhedron: The interior 
dihedral angles at the edges AB’, BC’ 
and CA’ are greater than 180°. 


The following result states that this is the worst case. 


Theorem. For any museum with n walls, || guards suffice. 


This “art gallery theorem” was first proved by VaSek Chvatal by a clever 
argument, but here is a proof due to Steve Fisk that is truly beautiful. 


@ Proof. First of all, let us draw n — 3 noncrossing diagonals between 
corners of the walls until the interior is triangulated. For example, we can 
draw 9 diagonals in the museum depicted in the margin to produce a trian- 
gulation. It does not matter which triangulation we choose, any one will do. 
Now think of the new figure as a plane graph with the corners as vertices 
and the walls and diagonals as edges. 


Claim. This graph is 3-colorable. 


For n = 3 there is nothing to prove. Now for n > 3 pick any two vertices 
u and v which are connected by a diagonal. This diagonal will split the 
graph into two smaller triangulated graphs both containing the edge uv. By 
induction we may color each part with 3 colors where we may choose color 
1 for u and color 2 for v in each coloring. Pasting the colorings together 
yields a 3-coloring of the whole graph. 


The rest is easy. Since there are n vertices, at least one of the color classes, 
say the vertices colored |, contains at most LS vertices, and this is where 
we place the guards. Since every triangle contains a vertex of color | we in- 
fer that every triangle is guarded, and hence so is the whole museum. 


The astute reader may have noticed a subtle point in our reasoning. Does 
a triangulation always exist? Probably everybody’s first reaction is: Obvi- 
ously, yes! Well, it does exist, but this is not completely obvious, and, 
in fact, the natural generalization to three dimensions (partitioning into 
tetrahedra) is false! This may be seen from Schénhardt’s polyhedron, de- 
picted on the left. It is obtained from a triangular prism by rotating the 
top triangle, so that each of the quadrilateral faces breaks into two triangles 
with a nonconvex edge. Try to triangulate this polyhedron! You will notice 
that any tetrahedron that contains the bottom triangle must contain one of 
the three top vertices: but the resulting tetrahedron will not be contained in 
Schonhardt’s polyhedron. So there is no triangulation without an additional 
vertex. 


To prove that a triangulation exists in the case of a planar nonconvex 
polygon, we proceed by induction on the number n of vertices. For n = 3 
the polygon is a triangle, and there is nothing to prove. Let n > 4. To 
use induction, all we have to produce is one diagonal which will split the 
polygon P into two smaller parts, such that a triangulation of the polygon 
can be pasted together from triangulations of the parts. 

Call a vertex A convex if the interior angle at the vertex is less than 180°. 
Since the sum of the interior angles of P is (n — 2)180°, there must be a 
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convex vertex A. In fact, there must be at least three of them: In essence 
this is an application of the pigeonhole principle! Or you may consider the 
convex hull of the polygon, and note that all its vertices are convex also for 
the original polygon. 

Now look at the two neighboring vertices B and C of A. If the segment 
BC lies entirely in P, then this is our diagonal. If not, the triangle ABC 
contains other vertices. Slide BC towards A until it hits the last vertex Z 
in ABC. Now AZ is within P, and we have a diagonal. 


There are many variants to the art gallery theorem. For example, we may 
only want to guard the walls (which is, after all, where the paintings hang), 
or the guards are all stationed at vertices. A particularly nice (unsolved) 
variant goes as follows: 


Suppose each guard may patrol one wall of the museum, so he 
walks along his wall and sees anything that can be seen from any 
point along this wall. 

How many “wall guards” do we then need to keep control? 


Godfried Toussaint constructed the example of a museum displayed here 
which shows that | + | guards may be necessary. 

This polygon has 28 sides (and, in general, 4m sides), and the reader is in- 
vited to check that m wall-guards are needed. It is conjectured that, except 
for some small values of n, this number is also sufficient, but a proof, let 
alone a Book Proof, is still missing. 
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“Museum guards” 
(A 3-dimensional art-gallery problem) 


Turan’s graph theorem Chapter 41 


One of the fundamental results in graph theory is the theorem of Turan 
from 1941, which initiated extremal graph theory. Turdn’s theorem was 
rediscovered many times with various different proofs. We will discuss five 
of them and let the reader decide which one belongs in The Book. 

Let us fix some notation. We consider simple graphs G on the vertex set 
V = {v1,...,Un} and edge set EZ. If v; and v; are neighbors, then we 
write vjv; € E. A p-clique in G is a complete subgraph of G' on p vertices, 
denoted by ,,. Paul Turan posed the following question: 


Suppose G is a simple graph that does not contain a p-clique. 
What is the largest number of edges that G can have? 


We readily obtain examples of such graphs by dividing V into p—1 pairwise Paul Turan 
disjoint subsets V = Vj U--- UVp-1, |Vi| = ni, 2 = 21 +--+ + Np-1, 
joining two vertices if and only if they lie in distinct sets V;, V;. We denote 
the resulting graph by Kp, ,.n,_13 it has )0,— ; min edges. We obtain a 
maximal number of edges among such graphs with given n if we divide 
the numbers n,; as evenly as possible, that is, if |n; — n;| < 1 for all i,j. 
Indeed, suppose n, > nz + 2. By shifting one vertex from V, to V2, we 
obtain Ky, —1,n241,....np_1 Which contains (nj — 1)(nz2 + 1) — ning = 
n1 — ng — 1 > 1 more edges than Kony jng,..np-0 Let us call the graphs 
Kny,....np_, With [ny — nj| < 1 the Turan graphs. In particular, if p — 1 
divides n, then we may choose n; = ran | for all 7, obtaining The graph K2,2,3 


Ga) = 0-5 
5 on a paid 2 
edges. Turdan’s theorem now states that this number is an upper bound for 
the edge-number of any graph on n vertices without a p-clique. 


Theorem. /f a graph G = (V,E) on n vertices has no p-clique, p > 2, 
then 1 n2 
B| < (1-—)*. 1 
Il < (1-54) 5 a) 
For p = 2 this is trivial. In the first interesting case p = 3 the theorem states 
2 
that a triangle-free graph on n vertices contains at most “> edges. Proofs 
of this special case were known prior to Turan’s result. Two elegant proofs 
using inequalities are contained in Chapter 20. 
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Let us turn to the general case. The first two proofs use induction and are 
due to Turan and to Erd6s, respectively. 


@ First proof. We use induction on n. One easily computes that (1) is true 
for n < p. Let G be a graph on V = {v1,..., Un} without p-cliques with 
a maximal number of edges, where n > p. G certainly contains (p — 1)- 
cliques, since otherwise we could add edges. Let A be a (p— 1)-clique, and 
set B:=V\A. 

A contains (? rs edges, and we now estimate the edge-number eg in B 
and the edge-number e 4,3 between A and B. By induction, we have eg < 
$(1- satin —p+1)?. Since G has no p-clique, every v; € B is adjacent 
to at most p — 2 vertices in A, and we obtain e4.p < (p — 2)(n—p+1). 
Altogether, this yields 


a 


els ("5") +30- Sy) p+ 1+ 0-20-40), 


: : : 1 \n?2 
which is precisely (1 — =~)*F. 


@ Second proof. This proof makes use of the structure of the Turan 
graphs. Let v;, € V be a vertex of maximal degree d,, = max,<j<n dj. 
Denote by S the set of neighbors of v,, |.S'| = dm, and set T :-= V\S. As 
G contains no p-clique, and v,, is adjacent to all vertices of S', we note that 
S' contains no (p — 1)-clique. 


We now construct the following graph H on V (see the figure). H corre- 
sponds to G on S and contains all edges between S and T’, but no edges 
within 7’. In other words, 7’ is an independent set in H, and we con- 
clude that H has again no p-cliques. Let di be the degree of vu; in H. 
If v; € S, then we certainly have di > d, by the construction of H, and 
for v; € T’, we see dj = |S| = dy, > d,; by the choice of v,,. We in- 
fer |E(H)| > |F|, and find that among all graphs with a maximal number 
of edges, there must be one of the form of H. By induction, the graph 
induced by S' has at most as many edges as a suitable graph K,,, 
on S. So |B] < |E(E)| < B(Kn, 
plies (1). 


sep Np—2 


np-1) With np_1 = |T|, which im- 


gees 


The next two proofs are of a totally different nature, using a maximizing 
argument and ideas from probability theory. They are due to Motzkin and 
Straus and to Alon and Spencer, respectively. 


@ Third proof. Consider a probability distribution w = (wy1,...,Wn) 
on the vertices, that is, an assignment of values w; > 0 to the vertices with 
yy, wi = 1. Our goal is to maximize the function 


vVivjEE 


Suppose w is any distribution, and let v; and v; be a pair of nonadjacent 
vertices with positive weights w;, w;. Let s; be the sum of the weights of 
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all vertices adjacent to v;, and define s; similarly for v;, where we may 
assume that s; > s;. Now we move the weight from vu; to v;, that is, the 
new weight of v; is w; + w;, while the weight of v; drops to 0. For the new 
new distribution w’ we find 


f(w') = fw) + wis; — ws; > fw). 


We repeat this (reducing the number of vertices with a positive weight by 
one in each step) until there are no nonadjacent vertices of positive weight 
anymore. Thus we conclude that there is an optimal distribution whose 
nonzero weights are concentrated on a clique, say on a k-clique. Now if, 
say, W, > W2 > 0, then choose € with 0 < € < w , — we and change w, 
to w; — € and we to w2 + €. The new distribution w’ satisfies f(w’) = 
f(w) + (wi — we) — e? > f(w), and we infer that the maximal value of 
f(w) is attained for w; = i on a k-clique and w; = 0 otherwise. Since a 


k(k—1) 
2 


k-clique contains edges, we obtain 


Few) = 2a = 901-3) 


Since this expression is increasing in k, the best we can do is to set k = p—1 
(since G has no p-cliques). So we conclude 


for any distribution w. In particular, this inequality holds for the uniform 
distribution given by w; = + for all 7. Thus we find 


FT = #(w=+) < d(1 =) 
i ieee y eae p-l/’ 


which is precisely (1). 


@ Fourth proof. This time we use some concepts from probability theory. 
Let G be an arbitrary graph on the vertex set V = {v1,..., Un}. Denote the 
degree of v; by d;, and write w(G) for the number of vertices in a largest 
clique, called the clique number of G. 


n 


1 
Claim. We have w(G) > . 
(Oe Di 
i=1 
We choose a random permutation 7 = v1V2...Un of the vertex set V, 


where each permutation is supposed to appear with the same probability 
4. and then consider the following set C,. We put v; into C, if and only 
if v; is adjacent to all v; (j < 2) preceding v;. By definition, C, is a 
clique in G. Let X = |C,,| be the corresponding random variable. We have 
X= ys X;, where X; is the indicator random variable of the vertex v;, 
that is, X; = 1 or X; = 0 depending on whether v; € C; or vj ¢ C,. Note 
that v; belongs to C’, with respect to the permutation v) v2... Uy if and only 
if v; appears before all n — 1 — d; vertices which are not adjacent to v;, or 
in other words, if vu; is the first among v; and its n — 1 — d; non-neighbors. 


The probability that this happens is =r hence FX; = _— i 
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Turdn’s graph theorem 


Thus by linearity of expectation (see page 116) we obtain 


Consequently, there must be a clique of at least that size, and this was our 
claim. To deduce Turadn’s theorem from the claim we use the Cauchy— 
Schwarz inequality from Chapter 20, 


(San)’ s (He) (Te) 


Set a; = /n — dj, bj = ae then a,b; = 1, and so 


n n 


ne < (Sn- a) >) a w(G) Di(n— di). (2) 


i=l i=l i=l 


At this point we apply the hypothesis w(G') < p — 1 of Turdn’s theorem. 
Using also $>""_, d; = 2|E| from the chapter on double counting, inequal- 
ity (2) leads to 

n® < (p—1)(n? — 2|)), 


and this is equivalent to Turdn’s inequality. 


Now we are ready for the last proof, which may be the most beautiful of 
them all. Its origin is not clear; we got it from Stephan Brandt, who heard 
it in Oberwolfach. It may be “folklore” graph theory. It yields in one stroke 
that the Turan graph is in fact the unique example with a maximal number 
of edges. It may be noted that both proofs 1 and 2 also imply this stronger 
result. 


@ Fifth proof. Let G be a graph on n vertices without a p-clique and with 
a maximal number of edges. 


Claim. G does not contain three vertices u,v,w such that vw € 
E, but uw ¢ E, uw ¢ E. 


Suppose otherwise, and consider the following cases. 


Case 1: d(u) < d(v) or d(u) < d(w). 
We may suppose that d(u) < d(v). Then we duplicate v, that is, we create 
a new vertex v’ which has exactly the same neighbors as v (but vv’ is not 
an edge), delete w, and keep the rest unchanged. 
The new graph G’ has again no p-clique, and for the number of edges we 
find 

JE(G@)| = |E(G)|+d(v)-dwu) > |B(G@)|, 


a contradiction. 


Turdn’s graph theorem 


289 


Case 2: d(u) > d(v) and d(u) > d(w). 

Duplicate u twice and delete v and w (as illustrated in the margin). Again, 
the new graph G’ has no p-clique, and we compute (the —1 results from the 
edge vw): 


So we have a contradiction once more. 
A moment’s thought shows that the claim we have proved is equivalent to 
the statement that 

urvu => w€E(G) 
defines an equivalence relation. Thus G is a complete multipartite graph, 
G = Kpy,,....np—1> and we are finished. 
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Communicating without errors 


In 1956, Claude Shannon, the founder of information theory, posed the 
following very interesting question: 


Suppose we want to transmit messages across a channel (where 
some symbols may be distorted) to a receiver. What is the maximum 
rate of transmission such that the receiver may recover the original 
message without errors? 


Let us see what Shannon meant by “channel” and “rate of transmission.” 
We are given a set V of symbols, and a message is just a string of symbols 
from V. We model the channel as a graph G = (V, E), where V is the set 
of symbols, and £ the set of edges between unreliable pairs of symbols, 
that is, symbols which may be confused during transmission. For example, 
communicating over a phone in everyday language, we connnect the sym- 
bols B and P by an edge since the receiver may not be able to distinguish 
them. Let us call G the confusion graph. 

The 5-cycle Cs will play a prominent role in our discussion. In this exam- 
ple, 1 and 2 may be confused, but not | and 3, etc. Ideally we would like 
to use all 5 symbols for transmission, but since we want to communicate 
error-free we can — if we only send single symbols — use only one let- 
ter from each pair that might be confused. Thus for the 5-cycle we can use 
only two different letters (any two that are not connected by an edge). In the 
language of information theory, this means that for the 5-cycle we achieve 
an information rate of log, 2 = 1 (instead of the maximal log, 5 ~ 2.32). 
It is clear that in this model, for an arbitrary graph G = (V, E), the best 
we can do is to transmit symbols from a largest independent set. Thus the 
information rate, when sending single symbols, is log, a(G), where a(G) 
is the independence number of G. 

Let us see whether we can increase the information rate by using larger 
strings in place of single symbols. Suppose we want to transmit strings of 
length 2. The strings u,u2 and v;v2 can only be confused if one of the 
following three cases holds: 


e@ wu, = Vv, and ug can be confused with v2, 
@ Uy = Ve and wu, can be confused with v,, or 
e uy # vy; can be confused and uz 4 v2 can be confused. 


In graph-theoretic terms this amounts to considering the product G, x G2 
of two graphs G; = (V;, £1) and G2 = (V2, Ep). Gi x G2 has the vertex 
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The graph C's x C5 


set Vi, x Vo = {(u1, u2) > uy, € Vi,ug € Vo}, with (ui, U2) x (v1, v2) 
connected by an edge if and only if uw; = v; or uv; € &; for? = 1,2. The 
confusion graph for strings of length 2 is thus G? = G x G, the product of 
the confusion graph G' for single symbols with itself. The information rate 
of strings of length 2 per symbol is then given by 


logy a(G? 
Se IC = top, Vale. 


Now, of course, we may use strings of any length n. The n-th confusion 
graph G” = Gx Gx---xG has vertex set V” = {(u1,..., Un) : ui € Vi} 
with (w1,...,Un) # (v1,...Un) being connected by an edge if u; = v; or 
ujv; € FE for all i. The rate of information per symbol determined by 
strings of length n is 


log, a(G") = log, Va(G”). 


n 


What can we say about a(G”)? Here is a first observation. Let U C V 
be a largest independent set in G, |U| = a. The a” vertices in G” of the 
form (u1,...,Un), ui € U for all 2, clearly form an independent set in G”. 
Hence 


a(G”) 


V 
2 
LS 

3 


and therefore 


Vala") 2 a(G), 


meaning that we never decrease the information rate by using longer strings 
instead of single symbols. This, by the way, is a basic idea of coding theory: 
By encoding symbols into longer strings we can make error-free communi- 
cation more efficient. 

Disregarding the logarithm we thus arrive at Shannon’s fundamental 
definition: The zero-error capacity of a graph G' is given by 


O(G) = sup V/a(G"), 
n>1 


and Shannon’s problem was to compute 0(G), and in particular O(Cs5). 


Let us look at Cs. So far we know a(Cs) = 2 < O(Cs). Looking at the 
5-cycle as depicted earlier, or at the product C's x Cs as drawn on the left, 
we see that the set {(1, 1), (2, 3), (3,5), (4, 2), (5, 4)} is independent in C?. 
Thus we have a(C?) > 5. Since an independent set can contain only two 
vertices from any two consecutive rows we see that a(C?) = 5. Hence, by 
using strings of length 2 we have increased the lower bound for the capacity 


to O(Cs) > V5. 


So far we have no upper bounds for the capacity. To obtain such bounds 
we again follow Shannon’s original ideas. First we need the dual definition 
of an independent set. We recall that a subset C C V is a clique if any 
two vertices of C’ are joined by an edge. Thus the vertices form trivial 
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cliques of size 1, the edges are the cliques of size 2, the triangles are cliques 
of size 3, and so on. Let C be the set of cliques in G. Consider an arbitrary 
probability distribution a = (x, : v € V) on the set of vertices, that 
is, Ty > O and eew Ly, = 1. To every distribution x we associate the 
“maximal value of a clique” 


Mz) = max) Bo 


and finally we set 


MG) = min A(a) = min max ) | sty. 


To be precise we should use inf instead of min, but the minimum exists 
because \(x) is continuous on the compact set of all distributions. 

Consider now an independent set U C V of maximal size a(G) = a. 
Associated to U we define the distribution xy = (x, : v € V) by setting 
C4 = + ifv € U and x, = 0 otherwise. Since any clique contains at most 
one vertex from U, we infer \(ay) = +, and thus by the definition of \(G) 


MO) < eR a. weno 


What Shannon observed is that A(G)-1 is, in fact, an upper bound for all 
¥/a(G”), and hence also for O(G). In order to prove this it suffices to 
show that for graphs G, H 
MG x H) = (G)X(A) (1) 
holds, since this will imply A(G”) = A(G)" and hence 
a(G") < AG") =X(G)™" 
Va(Gr) < XG). 
To prove (1) we make use of the duality theorem of linear programming 
(see [1]) and get 
d ae, tes = : 
(G) min max > x max min a YC; (2) 
vec C3u 

where the right-hand side runs through all probability distributions y = 
(yo: CEC) onc. 
Consider G x H, and let x and 2’ be distributions which achieve the 
minima, A(a) = A(G), A(a@’) = A(H). In the vertex set of G x H we 
assign the value z(,,,,) = 2,2’, to the vertex (u,v). Since hi v) (uv) = 
yo, Lu ds, X, = 1, we obtain a distribution. Next we observe that the max- 
imal cliques in G x H are of the form C' x D = {(u,v) :u€ Cv € D} 
where C' and D are cliques in G' and H, respectively. Hence we obtain 


MGx H) < A(z) = max So zap) 
Oe? etn 


= max > La > x, = A(G)\(A) 


UEC vED 
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The Lovasz umbrella 


by the definition of \(G x H). In the same way the converse inequality 
MG x H) > (G)A(A) is shown by using the dual expression for \(G) 
in (2). In summary we can state: 


e(G) < a)", 


for any graph G. 


Let us apply our findings to the 5-cycle and, more generally, to the 
m-cycle Cy. By using the uniform distribution (+,...,+) on the 
vertices, we obtain A(Ci,) < 2, since any clique contains at most two 
vertices. Similarly, choosing + for the edges and 0 for the vertices, we have 
MCm) > = by the dual expression in (2). We conclude that \(C,,) = 2 
and therefore 


O(Cm) < 5 


for all m. Now, if m is even, then clearly a(C,,) = 4} and thus also 
O(C,,) = %. For odd m, however, we have a(C;,,) = “5+. For m = 3, 
C3 is a clique, and so is every product C, implying a(C3) = O(C3) = 1. 
So, the first interesting case is the 5-cycle, where we know up to now 


V5 < O(Cs) < 7 (3) 


Using his linear programming approach (and some other ideas) Shannon 
was able to compute the capacity of many graphs and, in particular, of all 
graphs with five or fewer vertices — with the single exception of C’;, where 
he could not go beyond the bounds in (3). This is where things stood for 
more than 20 years until L4szl6 Lovasz showed by an astonishingly simple 
argument that indeed O(C;) = V5. A seemingly very difficult combina- 
torial problem was provided with an unexpected and elegant solution. 
Lovasz’ main new idea was to represent the vertices v of the graph by 
real vectors of length 1 such that any two vectors which belong to non- 
adjacent vertices in G are orthogonal. Let us call such a set of vectors 
an orthonormal representation of G. Clearly, such a representation always 
exists: just take the unit vectors (1,0,...,0)7, (0,1,0,...,0)", ..., 
(0,0,...,1)7 of dimension m = |V]. 


For the graph Cs we may obtain an orthonormal representation in R® by 
considering an “umbrella” with five ribs v1,...,v5 of unit length. Now 
open the umbrella (with tip at the origin) to the point where the angles 
between alternate ribs are 90°. 

Lovasz then went on to show that the height / of the umbrella, that is, the 
distance between O and S, provides the bound 


O(Cs) < (4) 


he . 


A simple calculation yields h? = ct see the box on the next page. From 
this O(Cs) < V5 follows, and therefore O(C5) = V5. 
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Let us see how Lovasz proceeded to prove the inequality (4). (His results 
were, in fact, much more general.) Consider the usual inner product 


(x, y) = T1Y1i ++ + LsYs 
of two vectors x = (21,...,2s), y = (Y1,---,Ys) in R’. Then |x|? = 


(a,x) = a2} +---+ 2? is the square of the length || of x, and the angle 7 
between z and y is given by 


cosy = 


Thus (a, y) = 0 if and only if x and y are orthogonal. 


Pentagons and the golden section 


Tradition has it that a rectangle was considered aesthetically pleasing 
if, after cutting off a square of length a, the remaining rectangle had 
the same shape as the original one. The side lengths a,b of such a 
rectangle must satisfy 8 = pq: Setting T = B for the ratio, we 
obtain T = 4 or T? — tT — 1 = 0. Solving the quadratic equation 


yields the golden section T = ao = 1.6180. 

Consider now a regular pentagon of side length a, and let d be the 

length of its diagonals. It was already known to Euclid (Book XIII,8) 

that d = T, and that the intersection point of two diagonals divides 

the diagonals in the golden section. 

Here is Euclid’s Book Proof. Since the total angle sum of the pen- 
3m 


tagon is 37, the angle at any vertex equals =. It follows that 


<ABE = &, since ABE is an isosceles triangle. This, in turn, 

implies AM B = ae and we conclude that the triangles ABC and 

AM B are similar. The quadrilateral C/V ED is a rhombus since op- 

posing sides are parallel (look at the angles), and so |M/C| = a and 

thus |AM| = d— a. By the similarity of ABC and AMB we con- 
clude 

d_ |AC| |AB| a |MC| _ 

|AB] |AM| d—-a |MA| 

There is more to come. For the distance s of a vertex to the center of 


the pentagon S, the reader is invited to prove the relation s? = ae 


(note that BS cuts the diagonal AC at a right angle and halves it). 

To finish our excursion into geometry, consider now the umbrella 

with the regular pentagon on top. Since alternate ribs (of length 1) 

form a right angle, the theorem of Pythagoras gives us d = \/2, and 
2 4 


hence s* = eo eee So, with Pythagoras again, we find for the 


height h = |OS| our promised result 
h2 — il 82 — L a v5 — L 
V5+5 V5 


We 
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Now we head for an upper bound for the Shannon capacity of any graph G 
that has an especially “nice” orthonormal representation. For this let T = 
{vu ,...,v™} be an orthonormal representation of G' in R*, where v 
corresponds to the vertex v;. We assume in addition that all the vectors v 
have the same angle (4 90°) with the vector u = +(v +---+'™), 
or equivalently that the inner product 


(vu) = e, 


has the same value o,,, 4 0 for all 7. Let us call this value o,,, the constant 


of the representation 7’. For the Lovasz umbrella that represents C's the 
condition (vu), w) = 7, certainly holds, for u = OS. 

Now we proceed in the following three steps. 

(A) Consider a probability distribution 7 = (21,..., 2%) on V and set 


w(e) = fav) +--+ a_ul™/?, 


and 
Mp (G) = inf u(a). 


Let U be a largest independent set in G with |U| = a, and define xy = 
(@1,.--;%m) with a; = oa if v; € U and x; = O otherwise. Since all 
vectors v have unit length and (vv) = 0 for any two nonadjacent 
vertices, we infer 


a(G) 


IA 


(B) Next we compute 1, (G). We need the Cauchy—Schwarz inequality 
(a,b)? < Jal? |b|? 


for vectors a, b € R*. Applied to a = rv) +..-4+2,v0(™ and b= u, 
the inequality yields 


(xo) a a tmu™, au)? < L(x) |u|*. (5) 
By our assumption that (v, u) = 0 for all i, we have 
(xu) 4.--4+an,0™,u) = (ish 49 sea) Ge = Op 


for any distribution x. Thus, in particular, this has to hold for the uniform 
distribution (1.,..., +), which implies |u|? = o,,. Hence (5) reduces to 


Wee 


o- < w(x) o, or Pipl G) 2B 
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On the other hand, for x = (+,..., 


m? 


+) we obtain 


lt, (G) < w(x) = JZ (uo 4... 4+ 9(™)/? = |20|? = Op, 


and so we have proved 


in(G) = om. (6) 
In summary, we have established the inequality 
1 
AG) Ss —— (7) 
oe 


for any orthonormal respresentation 7’ with constant o,,. 


(C) To extend this inequality to O(G), we proceed as before. Consider 
again the product G' x H of two graphs. Let G and H have orthonormal 
representations R and S in R” and R*, respectively, with constants 7, 
and o,. Let v = (v1,...,U,) be a vector in R and w = (wi,..., ws) be 
a vector in S'. To the vertex in G x H corresponding to the pair (v, w) we 


associate the vector 
a R™ 
Vw = (U{W1,..., U1Ws, V2W1,---, U2Ws,---, UpW1,---, UrWs) € R™ 


It is immediately checked that R x S := {vw? : v € R,w € S} isan 
orthonormal representation of G x H with constant 7,,0,. Hence by (6) 


we obtain 
Mays (Gx H) = Up(G)ug(H). 


For G" = G x --- x G and the representation T’ with constant o,,, this 
means 


un (G") = pp(G)” = of? 
and by (7) we obtain 
OG") = o Va(G") < a 


Taking all things together we have thus completed Lovasz’ argument: 


Theorem. Whenever T = {v\,...,v°™} is an orthonormal 
representation of G with constant 0,,,, then 


1 
CC. = (8) 
ae 
Looking at the Lovdsz umbrella, we have u = (0,0, h=%)" and hence 
o = (vu) = h? = -L, which yields O(C;) < V5. Thus Shannon’s 


= 
problem is solved. 
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0 10 0 1 
1 01 0 0 
A=] 0 101 0 
001 0 1 
1 0 0 1 +0 


The adjacency matrix for the 5-cycle Cs 


Let us carry our discussion a little further. We see from (8) that the larger o,,, 
is for a representation of G, the better a bound for 0(G) we will get. Here 
is a method that gives us an orthonormal representation for any graph G. 
To G = (V,E) we associate the adjacency matrix A = (aj), which is 
defined as follows: Let V = {v1,..., Um}, then we set 
—s { 1 if vvj EE 
‘J “| 0 otherwise. 

A is areal symmetric m x m matrix with 0’s in the main diagonal. 

Now we need two facts from linear algebra. First, as a symmetric matrix, 
A has m real eigenvalues Ay > Az > --- > Am (some of which may 
be equal), and the sum of the eigenvalues equals the sum of the diagonal 
entries of A, that is, 0. Hence the smallest eigenvalue must be negative 


(except in the trivial case when G has no edges). Let p = |Am| = —Am be 
the absolute value of the smallest eigenvalue, and consider the matrix 


1 
M = 1+ -A, 
Pp 


where J denotes the m x m identity matrix. This / has the eigenvalues 
1+ a >1+ 7 Pere DPIt Am = 0. Now we quote the second result (the 
principal axis theorem of linear algebra): If M = (m,,;) is areal symmetric 
matrix with all eigenvalues > 0, then there are vectors v),...,00™ € R® 
for s = rank(M), such that 


Mig = (uy  y9)) (1 < Veg) < m). 


In particular, for MW = I + at we obtain 


(v® v9) = my =1 for all i 
and : ; 1 
(v9 yD) = — i; fori A j. 
Pp 
Since a;; = 0 whenever v;v; ¢ E, we see that the vectors v2. ylm) 


form indeed an orthonormal representation of G. 


Let us, finally, apply this construction to the m-cycles Ci, for odd m > 5. 
Here one easily computes p = |Amin| = 2cos = (see the box). Every 
row of the adjacency matrix contains two 1’s, implying that every row of 
the matrix M7 sums to 1 + - For the representation {v),..., v(”)} this 
means 


: 2 
(v® yD 4... 4 yl) =1+- = 14+ —_ 
Pp 


and hence ; 
(wu) = —(1 + (cos Z)"!) = o 
m 
for all 7. We can therefore apply our main result (8) and conclude 


O(C,) < ———— (form > 5 odd). (9) 
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Notice that because of cos * < 1 the bound (9) is better than the bound 
O(Cm) < F we found before. Note further cos = 5, where T = vB41 
is the golden section. Hence for m = 5 we again obtain 


eo ee cae 


1+72, 5+ v5 


The orthonormal representation given by this construction is, of course, 
precisely the “Lovasz umbrella.” 


And what about C7, Cy, and the other odd cycles? By considering a(C2.), 
a(C3.) and other small powers the lower bound “> < @(C,,) can cer- 
tainly be increased, but for no odd m > 7 do the best known lower bounds 
agree with the upper bound given in (8). So, twenty years after Lovasz’ 
marvelous proof of O(Cs) = V5, these problems remain open and are 


considered very difficult — but after all we had this situation before. 


The eigenvalues of C,, 


Look at the adjacency matrix A of the cycle C;,. To find the eigen- 
values (and eigenvectors) we use the m-th roots of unity. These are 
given by 1,¢,¢?,...,¢~! for ¢ = e*m — see the box on page 37. 
Let 4 = C¢* be any of these roots, then we claim that 
(1,A,A7,..., 1)” is an eigenvector of A to the eigenvalue \ + 
d~1. In fact, by the set-up of A we find 


1 ee 1 

r wm + 1 aN 
A| » =|. 4 =(Ata74) | 

\r-t 1 aie, \n-2 \nr-t 
Since the vectors (1, ,...,""~1) are independent (they form a so- 


called Vandermonde matrix) we conclude that for odd m 


Cka¢c-® = [(cos(2ka/m) + isin(2km/m)] 
+ [cos(2ka/m) — isin(2k2/m)| 
= 2cos(2kn/m) (Ces 


are all the eigenvalues of A. Now the cosine is a decreasing function, 


and so 1 
20s ("= Hi) = 2. cos il 
m m 


is the smallest eigenvalue of A. 


For example, form = 7 all we know is 


i 
o/ < i ee 
350 < O(C7) < plese ZA ; 


which is 3.2271 < @(C7) < 3.3177. 
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The chromatic number 
of Kneser graphs 


In 1955 the number theorist Martin Kneser posed a seemingly innocuous 
problem that became one of the great challenges in graph theory until a bril- 
liant and totally unexpected solution, using the “Borsuk—Ulam theorem” 
from topology, was found by Laszlé Lovasz twenty-three years later. 

It happens often in mathematics that once a proof for a long-standing prob- 
lem is found, a shorter one quickly follows, and so it was in this case. 
Within weeks Imre Barany showed how to combine the Borsuk—Ulam 
theorem with another known result to elegantly settle Kneser’s conjecture. 
Then in 2002 Joshua Greene, an undergraduate student, simplified Barany’s 
argument even further, and it is his version of the proof that we present here. 
But let us start at the beginning. Consider the following graph K(n,k), 
now called Kneser graph, for integers n > k > 1. The vertex-set V(n, k) 
is the family of k-subsets of {1,...,n}, thus |V(n,k)| = (i). Two such 
k-sets A and B are adjacent if they are disjoint, AN B = ©. 

If n < 2k, then any two k-sets intersect, resulting in the uninteresting case 
where /(n, k) has no edges. So we assume from now on that n > 2k. 
Kneser graphs provide an interesting link between graph theory and finite 
sets. Consider, e.g., the independence number a( K(n, k)), that is, we ask 
how large a family of pairwise intersecting k-sets can be. The answer 
is given by the Erdés—Ko-Rado theorem of Chapter 30: a(K(n,k)) = 
(e1)- 

We can similarly study other interesting parameters of this graph family, 
and Kneser picked out the most challenging one: the chromatic number 
x (iC (n, k)). We recall from previous chapters that a (vertex) coloring of 
a graph G is a mapping c : V —> {1,...,m} such that adjacent vertices 
are colored differently. The chromatic number .(G) is then the minimum 
number of colors that is sufficient for a coloring of V. In other words, we 
want to present the vertex set V as a disjoint union of as few color classes 
as possible, V = V, U --- U Vy), such that each set V; is edgeless. 

For the graphs K(n, k) this asks for a partition V(n,k) = Vi U---UVy, 
where every V; is an intersecting family of k-sets. Since we assume that 
n > 2k, we write from now onn = 2k+d,k>1,d> 0. 

Here is a simple coloring of K(n,k) that uses d + 2 colors: For i = 1, 
2,...,d+1, let V; consist of all k-sets that have 2 as smallest element. The 
remaining k-sets are contained in the set {d+ 2,d+3,...,2k+d}, which 
has only 2k — 1 elements. Hence they all intersect, and we can use color 
d+ 2 for all of them. 
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{4,5} {3, 4} 


{2, 3} {1, 5} 
The Kneser graph K (5, 2) is the famous 
Petersen graph. 


This implies that 
x(K(n,k)) > Wh = A = 
ia 


ra 


2 1 
The 3-coloring of the Petersen graph. 
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For d = 0, K (2k, k) consists of disjoint 
edges, one for every pair of complemen- 
tary k-sets. Hence x(K (2k, k)) = 2, in 
accordance with the conjecture. 


So we have x(K (2k +d,k)) < d+ 2, and Kneser’s challenge was to show 
that this is the right number. 


Kneser’s conjecture. We have 


y(K(2k + d,k)) = d+2. 


Probably anybody’s first crack at the proof would be to try induction on 
k; and d. Indeed, the starting cases k = 1 and d = 0,1 are easy, but the 
induction step from k to k + 1 (or d to d+ 1) does not seem to work. So let 
us instead reformulate the conjecture as an existence problem: 

If the family of k-sets of {1,2,..., 2k +d} is partitioned into d+ 1 classes, 
V(n,k) = Vi U--+U Vaya, then for some i, V; contains a pair A, B of 
disjoint k-sets. 

Lovasz’ brilliant insight was that at the (topological) heart of the problem 
lies a famous theorem about the d-dimensional unit sphere S' 4 in RO, 
Sea tye Rete al ed) 


The Borsuk—Ulam theorem 
For every continuous map f : S4 —+ R¢ from d-sphere to d-space, 
there are antipodal points «*,—x* that are mapped to the same 


point f(a*) = f(—a*). 


This result is one of the cornerstones of topology; it first appeared in Bor- 
suk’s famous 1933 paper. We sketch a proof in the appendix; for the full 
proof we refer to Section 2.2 in MatouSek’s wonderful book “Using the 
Borsuk—Ulam theorem’, whose very title demonstrates the power and range 
of the result. Indeed, there are many equivalent formulations, which under- 
line the central position of the theorem. We will employ a version that 
can be traced back to a book by Lyusternik—Shnirel’ man from 1930, which 
even predates Borsuk. 


Theorem. /f the d-sphere S“ is covered by d + 1 sets, 
S? = UU -+- UUgU Ua, 


such that each of the first d sets U,,...,Uq is either open or closed, then 


Ba 


one of the d + 1 sets contains a pair of antipodal points x* ,—x”*. 


The case when all d+1 sets are closed is due to Lyusternik and Shnirel’ man. 
The case when all d+1 sets are open is equally common, and also called the 
Lyusternik—Shnirel’man theorem. Greene’s insight was that the theorem is 
also true if each of the d + 1 sets is either open or closed. As you will see, 
we don’t even need that: No such assumption is needed for Ugi1. For the 
proof of Kneser’s conjecture, we only need the case when U;,...,Uq are 
open. 
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@ Proof of the Lyusternik—Shnirel’man theorem using Borsuk—Ulam. 
Let a covering S¢ = U,U--» UUgU Uqg4+1 be given as specified, and 
assume that there are no antipodal points in any of the sets U;. We define a 
map f : S¢ > R¢ by 


f(x) = (d(x, U1), (2, U2), seey O(a, Ua)). 


Here 5(a, U;) denotes the distance of x from U;. Since this is a continuous 
function in x, the map f is continuous. Thus the Borsuk—Ulam theorem 
tells us that there are antipodal points x*,—2«* with f(a*) = f(—a*). 
Since Ug, does not contain antipodes, we get that at least one of x* and 
—a* must be contained in one of the sets U;, say in U, (kK < d). After 
exchanging x* with —x* if necessary, we may assume that z* € U,. In 
particular this yields 6(a2*, U;,) = 0, and from f(x*) = f(—2*) we get that 
6(—a*,U;,) = 0 as well. 

If U; is closed, then 6(—ax*, U;,) = 0 implies that —a* € Ux, and we arrive 
at the contradiction that U;, contains a pair of antipodal points. 

If Uz is open, then 6(—2*, U;,) = 0 implies that —a* lies in Uz, the closure 
of Uz. The set Uz, in turn, is contained in S“\(—U;,), since this is a closed 
subset of S“ that contains U;,. But this means that —2* lies in $“\(—U,), 
so it cannot lie in —U;,, and x* cannot lie in Ux, a contradiction. 


As the second ingredient for his proof, Imre Barany used another existence 
result about the sphere $7. 


Gale’s Theorem. There is an arrangement of 2k + d points on S*% such 
that every open hemisphere contains at least k of these points. 


David Gale discovered his theorem in 1956 in the context of polytopes with 
many faces. He presented a complicated induction proof, but today, with 
hindsight, we can quite easily exhibit such a set and verify its properties. 
Armed with these results it is just a short step to settle Kneser’s problem, 
but as Greene showed we can do even better: We don’t even need Gale’s 
result. It suffices to take any arrangement of 2k + d points on S¢+? in 
general position, meaning that no d + 2 of the points lie on a hyperplane 
through the center of the sphere. Clearly, for d > 0 this can be done. 


@ Proof of the Kneser conjecture. For our ground set let us take 2k + d 
points in general position on the sphere S¢+!. Suppose the set V(n, k) 
of all k-subsets of this set is partitioned into d + 1 classes, V(n,k) = 
V, U---U Vari. We have to find a pair of disjoint k-sets A and B that 
belong to the same class Vj. 


For i = 1,...,d +1 we set 


O; = {x € S“*! : the open hemisphere H, 
with pole x contains a k-set from V;}. 


Clearly, each O; is an open set. Together, the open sets O; and the closed 
set C = S¢+1\ (O, U---U Og41) cover S¢+1. Invoking Lyusternik— 
Shnirel’man we know that one of these sets contains antipodal points x* 


The closure of Uz, is the smallest closed 
set that contains U;, (that is, the intersec- 
tion of all closed sets containing Uz). 


A, 


An open hemisphere in S? 
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and —a*. This set cannot be C! Indeed, if «*,—x* are in C, then by 
the definition of the O;’s, the hemispheres H,« and H_,.» would contain 
fewer than / points. This means that at least d + 2 points would be on 
the equator H,.» ( H_,« with respect to the north pole x*, that is, on a 
hyperplane through the origin. But this cannot be since the points are in 
general position. Hence some O; contains a pair «*, —x*, so there exist 
k-sets A and B both in class V;, with A C H,» and B C H_,». 


a* 


A— Hy» 


—y* 


But since we are talking about open hemispheres, H,» and H_,« are dis- 
joint, hence A and B are disjoint, and this is the whole proof. 


The reader may wonder whether sophisticated results such as the theorem 
of Borsuk—Ulam are really necessary to prove a statement about finite sets. 
Indeed, a beautiful combinatorial argument has later been found by Jiri 
Matousek — but on closer inspection it has a distinct, albeit discrete, topo- 
logical flavor. 


Appendix: 
A proof sketch for the Borsuk—Ulam theorem 


For any generic map (also known as general position map) from a compact 
d-dimensional space to a d-dimensional space, any point in the image has 
only a finite number of pre-images. For a generic map from a (d + 1)- 
dimensional space to a d-dimensional space, we expect every point in the 
image to have a 1-dimensional pre-image, that is, a collection of curves. 
Both in the case of smooth maps, and in the setting of piecewise-linear 
maps, one quite easily proves one can deform any map to a nearby generic 
map. 

For the Borsuk—Ulam theorem, the idea is to show that every generic map 
S¢ —+ R? identifies an odd (in particular, finite and nonzero) number of 
antipodal pairs. If f did not identify any antipodal pair, then it would be 
arbitrarily close to a generic map f without any such identification. 

Now consider the projection 7 : S¢ — R® that just deletes the last coor- 
dinate; this map identifies the “north pole” eg41 of the d-sphere with the 
“south pole” —eg11. For any given map f : S$? —> R®@ we construct a con- 
tinuous deformation from 7 to f, that is, we interpolate between these two 
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maps (linearly, for example), to obtain a continuous map 
F:S* x [0,1] —> R¢, 


with F(x,0) = m(x) and F(x,1) = f(x) for all x € S%. (Such a map is 
known as a homotopy.) 

Now we perturb F' carefully into a generic map F: S¢x (0, 1] > R¢, which 
again we may assume to be smooth, or piecewise-linear on a fine triangu- 
lation of S@ x [0,1]. If this perturbation is “small enough” and performed 
carefully, then the perturbed version of the projection 7(2) := F(a,0) 
should still identify the two antipodal points teq+, and no others. If F is 
sufficiently generic, then the points in S x [0, 1] given by 


M = {(2,t) € S*x [0,1] : F(-2,t) = F(a,t)} 


according to the implicit function theorem (smooth or piecewise-linear ver- 
sion) form a collection of paths and of closed curves. Clearly this collection 
is symmetric, that is, (—x,t) € M if and only if (x,t) € M. 

The paths in M can have endpoints only at the boundary of $% x [0,1], 
that is, att = O and att = 1. The only ends at t = 0, however, are at 
(+ea41, 0), and the two paths that start at these two points are symmetric 
copies of each other, so they are disjoint, and they can end only at ¢ = 1. 
This proves that there are solutions for F(—ax,t) = F(a,t) att = 1, and 
hence for f(—a) = f(a). 
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Of friends and politicians Chapter 44 
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Check for 
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It is not known who first raised the following problem or who gave it its 
human touch. Here it is: 


Suppose in a group of people we have the situation that any pair of 
persons have precisely one common friend. Then there is always a 
person (the “politician” ) who is everybody’s friend. 


In the mathematical jargon this is called the friendship theorem. 


Before tackling the proof let us rephrase the problem in graph-theoretic 
terms. We interpret the people as the set of vertices V and join two vertices 
by an edge if the corresponding people are friends. We tacitly assume that “4 politician’s smile” 
friendship is always two-ways, that is, if wu is a friend of v, then v is also 

a friend of uw, and further that nobody is his or her own friend. Thus the 

theorem takes on the following form: 


Theorem. Suppose that G is a finite graph in which any two vertices have 
precisely one common neighbor. Then there is a vertex which is adjacent to 
all other vertices. 


Note that there are finite graphs with this property; see the figure, where u 
is the politician. However, these “windmill graphs” also turn out to be the 
only graphs with the desired property. Indeed, it is not hard to verify that in 
the presence of a politician only the windmill graphs are possible. 
Surprisingly, the friendship theorem does not hold for infinite graphs! 
Indeed, for an inductive construction of a counterexample one may start for 
example with a 5-cycle, and repeatedly add common neighbors for all pairs 
of vertices in the graph that don’t have one, yet. This leads to a (countably) 
infinite friendship graph without a politician. 


A windmill graph 


Several proofs of the friendship theorem exist, but the first proof, given by 
Paul Erdés, Alfred Rényi and Vera Sés, is still the most accomplished. 


@ Proof. Suppose the assertion is false, and G is a counterexample, that is, 
no vertex of G is adjacent to all other vertices. To derive a contradiction we 
proceed in two steps. The first part is combinatorics, and the second part is 
linear algebra. 


(1) We claim that G' is a regular graph, that is, d(u) = d(v) for any u,v € V. 
Note first that the condition of the theorem implies that there are no cycles 
of length 4 in G. Let us call this the C'4-condition. v 
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We first prove that any two nonadjacent vertices u and v have equal degree 
d(u) = d(v). Suppose d(u) = k, where wi, ..., wz are the neighbors of w. 
Exactly one of the w;, say we, is adjacent to v, and we adjacent to exactly 
one of the other w,;’s, say wi, so that we have the situation of the figure to 
the left. The vertex v has with w, the common neighbor we, and with w; 
(¢ > 2) acommon neighbor z; (i > 2). By the C4-condition, all these z; 
must be distinct. We conclude d(v) > k = d(u), and thus d(u) = d(v) =k 
by symmetry. 

To finish the proof of (1), observe that any vertex different from we is not 
adjacent to either u or v, and hence has degree k, by what we already 
proved. But since we also has a non-neighbor, it has degree k as well, 
and thus G is k;-regular. 


Summing over the degrees of the k neighbors of u we get k?. Since 
every vertex (except u) has exactly one common neighbor with u, we have 
counted every vertex once, except for u, which was counted & times. So 
the total number of vertices of G is 


n = k-k+4+1. (1) 


(2) The rest of the proof is a beautiful application of some standard results 
of linear algebra. Note first that & must be greater than 2, since for k < 2 
only G = Kk, and G = Kz are possible by (1), both of which are trivial 
windmill graphs. Consider the adjacency matrix A = (a;;), as defined on 
page 298. By part (1), any row has exactly k 1’s, and by the condition of 
the theorem, for any two rows there is exactly one column where they both 
have a 1. Note further that the main diagonal consists of 0’s. Hence we 
have 


a era 
1 sk 

Ae = ; ae = (k-l1)I+J, 
P .uee 0 & 


where J is the identity matrix, and J the matrix of all 1’s. It is immediately 
checked that J has the eigenvalues n (of multiplicity 1) and 0 (of multi- 
plicity n — 1). It follows that A? has the eigenvalues k —1+n = k? 
(of multiplicity 1) and & — 1 (of multiplicity n — 1). 

Since A is symmetric and hence diagonalizable, we conclude that A has 
the eigenvalues / (of multiplicity 1) and +Vh4—1. Suppose r of the 
eigenvalues are equal to // — 1 and s of them are equal to —/k — 1, with 
r+s=n-—1. Now we are almost home. Since the sum of the eigenvalues 
of A equals the trace (which is 0), we find 


k+rvk-1-—svVk-1 = 0, 


and, in particular, r 4 s, and 


s—r 
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Now if the square root J/m of a natural number m is rational, then it is an 
integer! An elegant proof for this was presented by Dedekind in 1858: Let 
no be the smallest natural number with no./m € N. If /m ¢ N, then there 
exists 2 € N withO < /m-—¢ < 1. Setting ny := no(./m — £), we find 
ny € Nand n,J/m = no(J/m — £)./m = nom — (nom) € N. With 
nz < Ng this yields a contradiction to the choice of no. 

Returning to our equation, let us seth = /k — 1 EN, then 


A(s—r) = k = h? +1. 


Since h divides h? + 1 and h?, we find that h must be equal to 1, and 
thus k = 2, which we have already excluded. So we have arrived at a 
contradiction, and the proof is complete. 


However, the story is not quite over. Let us rephrase our theorem in the 
following way: Suppose G is a graph with the property that between any 
two vertices there is exactly one path of length 2. Clearly, this is an equiv- 
alent formulation of the friendship condition. Our theorem then says that 
the only such graphs are the windmill graphs. But what if we consider 
paths of length more than 2? A conjecture of Anton Kotzig asserts that the 
analogous situation is impossible. 


Kotzig’s Conjecture. Let ¢ > 2. Then there are no finite graphs with the 
property that between any two vertices there is precisely one path of 
length &. 


Kotzig himself verified his conjecture for 2 < 8. In [3] his conjecture is 
proved up to £ = 20, and Alexandr Kostochka has told us that it is now 
verified for all 2 < 33. A general proof, however, seems to be out of 
reach... 
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Probability makes counting 
(sometimes) easy 


Just as we started this book with the first papers of Paul Erdés in num- 
ber theory, we close it by discussing what will possibly be considered his 
most lasting legacy — the introduction, together with Alfred Rényi, of the 
probabilistic method. Stated in the simplest way it says: 


Tf, in a given set of objects, the probability that an object does not 
have a certain property is less than 1, then there must exist an object 
with this property. 


Thus we have an existence result. It may be (and often is) very difficult to 
find this object, but we know that it exists. We present here three examples 
(of increasing sophistication) of this probabilistic method due to Erdés, and 
end with a particularly elegant, quite recent application. 


As a warm-up, consider a family F of subsets A;, all of size d > 2, of a 
finite ground-set X. We say that F is 2-colorable if there exists a coloring 
of X with two colors such that in every set A; both colors appear. It is 
immediate that not every family can be colored in this way. As an example, 
take all subsets of size d of a (2d — 1)-set X. Then no matter how we 
2-color X, there must be d elements which are colored alike. On the other 
hand, it is equally clear that every subfamily of a 2-colorable family of 
d-sets is itself 2-colorable. Hence we are interested in the smallest number 
m = m(d) for which a family with m sets exists which is not 2-colorable. 
Phrased differently, m(d) is the largest number which guarantees that 
every family with less than m/(d) sets is 2-colorable. 


Theorem 1. Every family of at most 27~! d-sets is 2-colorable, that is, 
m(d) > 24-1, 


@ Proof. Suppose F is a family of d-sets with at most 24! sets. Color X 
randomly with two colors, all colorings being equally likely. For each set 
A € F let Ey be the event that all elements of A are colored alike. Since 
there are precisely two such colorings, we have 


Prob(E4) = (4)", 


and hence with m = |F| < 27-1 (note that the events E’, are not disjoint) 
Prob( |) Ea) < S~ Prob(E4) = m(4)"" < 1. 
ACF ACF 
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We conclude that there exists some 2-coloring of X without a unicolored 
d-set from F, and this is just our condition of 2-colorability. 


An upper bound for m(d), roughly equal to d?24, was also established by 
Erd6s, again using the probabilistic method, this time taking random sets 
and a fixed coloring. Using a very clever argument, Jaikumar Radhakrish- 
nan and Aravind Srinivasan have established the best lower bound to date, 


a 


ocd 24. As for exact values, only the 
g 


which is approximately equal to 


first two m(2) = 3, m(3) = 7 are known. Of course, m(2) = 3 is realized 
by the graph ‘3, while the Fano configuration yields m(3) < 7. Here F 
consists of the seven 3-sets of the figure (including the circle set {4, 5, 6}). 
The reader may find it fun to show that F cannot be 2-colored. To prove 
that all families of six 3-sets are 2-colorable, and hence m(3) = 7, requires 
a little more care. 


Our next example is the classic in the field — Ramsey numbers. Consider 
the complete graph Ky on N vertices. We say that Ky has property (m,n) 
if, no matter how we color the edges of Ay red and blue, there is always a 
complete subgraph on m vertices with all edges colored red or a complete 
subgraph on n vertices with all edges colored blue. It is clear that if Ky 
has property (m,n), then so does every K, with s > N. So, as in the first 
example, we ask for the smallest number JN (if it exists) with this property 
— and this is the Ramsey number R(m,n). 

As a start, we certainly have R(m,2) = m because either all of the edges 
of K,, are red or there is a blue edge, resulting in a blue 2. By symmetry, 
we have R(2,n) = n. Now, suppose R(m — 1,n) and R(m,n — 1) exist. 
We then prove that R(m,n) exists and that 


Rim,n) < R(m—1,n) + R(m,n-—1). (1) 


Suppose N = R(m — 1,n) + R(m,n — 1), and consider an arbitrary red- 
blue coloring of Ay. For a vertex v, let A be the set of vertices joined to v 
by ared edge, and B the vertices joined by a blue edge. 

Since |A| + |B| = N — 1, we find that either |A] > R(m — 1,n) or 
|B| > R(m,n — 1). Suppose |A| > R(m — 1,n), the other case being 
analogous. Then by the definition of R(m — 1, n), there either exists in A a 
subset A p Of size m — 1 all of whose edges are colored red which together 
with v yields a red Kj», or there is a subset A, of size n with all edges 
colored blue. We infer that Ky satisfies the (m,n)-property and Claim (1) 
follows. 


Combining (1) with the starting values R(m, 2) = m and R(2,n) =n, we 
obtain from the familiar recursion for binomial coefficients 


R(m,n) < (" oe ’). (2) 


m—1 


and, in particular, 


ok —2 2k —3 2k —3 as 
< = = : 
BG) ay (oi) + Gory) s2 


Probability makes counting (sometimes) easy 


313 


Now what we are really interested in is a lower bound for R(k,k). This 
amounts to proving for an as-large-as-possible N < R(k,k) that there 
exists a coloring of the edges such that no red or blue Ky, results. And this 
is where the probabilistic method comes into play. 


Theorem 2. For all k > 2, the following lower bound holds for the Ramsey 


numbers: 
k 


R(k,k) > 23. 


M@ Proof. We have R(2,2) = 2. From (2) we know R(3,3) < 6, and the 
pentagon colored as in the figure shows R(3, 3) = 6. 


Now let us assume k > 4. Suppose N < 22, and consider all red-blue 
colorings, where we color each edge independently red or blue with proba- 


bility $. Thus all colorings are equally likely with probability 2-(2)_ Let A 
be a set of vertices of size k. The probability of the event A, that the edges 


in A are all colored red is then 2~(2). Hence it follows that the probability 
Pp for some k-set to be colored all red is bounded by 


Pp = Prob( U Ap) <4 S- Prob(A_,, ) = ({, )2®. 


|A|=k |A|=k 


Now with N < 22 andk > 4, using ied) < pike for k > 2 (see page 14), 
we have 


(jr < MEO) 2 of-O- 2 


Nie 


k = 9h-1 


Hence pp, < 5, and by symmetry pp, < 3 + for the probability of some 
k vertices with all edges between them colored blue. We conclude that 
Pa +P, < 1 torn < 22 , so there must be a coloring with no red or 
blue A;,, which means that Ky does not have property (k, k). 


Of course, there is quite a gap between the lower and the upper bound for 
R(k, k). Still, as simple as this Book Proof is, no lower bound with a better 
exponent has been found for general k in the more than sixty years since 
Erdés’ result. In fact, no one has been able to prove a lower bound of the 
form R(k,k) > 2(2+©)* nor an upper bound of the form R(k, k) < 2@-©)* 
for a fixed ¢ > 0. The most spectacular advance in recent years is due to 
David Conlon, who proved an upper bound of the form where w(k) 
tends to infinity (albeit very slowly) with k. 


4k 
Retk)? 


Our third result is another beautiful illustration of the probabilistic method. 
Consider a graph G' on n vertices and its chromatic number y(G). If x(G) 
is high, that is, if we need many colors, then we might suspect that G 
contains a large complete subgraph. However, this is far from the truth. 
Already in the fourties Blanche Descartes constructed graphs with arbitrar- 
ily high chromatic number and no triangles, that is, with every cycle having 
length at least 4, and so did several others (see the box on the next page). 
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Constructing the Mycielski graph 


However, in these examples there were many cycles of length 4. Can we do 
even better? Can we stipulate that there are no cycles of small length and 
still have arbitrarily high chromatic number? Yes we can! To make matters 
precise, let us call the length of a shortest cycle in G the girth y(G) of G; 
then we have the following theorem, first proved by Paul Erdés. 


Triangle-free graphs with high chromatic number 
Here is a sequence of triangle-free graphs G3, G'4,... with 


(Ge) = th 


Start with G3 = Cs, the 5-cycle; thus y(G3) = 3. Suppose we have 
already constructed G’,, on the vertex set V. The new graph G’,, 1 has 
the vertex set V U V’ U {z}, where the vertices v’ € V’ correspond 
bijectively to v € V, and z is a single other vertex. The edges of 
G,+1 fall into 3 classes: First, we take all edges of G,,; secondly 
every vertex vu’ is joined to precisely the neighbors of v in G,,; thirdly 
z is joined to all v’ € V’. Hence from G3 = Cs we obtain as G4 the 
so-called Mycielski graph. 

Clearly, G',,1 is again triangle-free. To prove x(Gy,41) =n +1 we 
use induction on n. Take any n-coloring of G’,, and consider a color 
class C. There must exist a vertex v € C' which is adjacent to at 
least one vertex of every other color class; otherwise we could dis- 
tribute the vertices of C onto the n — 1 other color classes, resulting 
in x(G,,) < n— 1. But now it is clear that v’ (the vertex in V’ cor- 
responding to v) must receive the same color as v in this n-coloring. 
So, all n colors appear in V’, and we need a new color for z. 


Theorem 3. For every k > 2, there exists a graph G with chromatic 
number x(G) > k and girth y(G) > k. 


The strategy is similar to that of the previous proofs: We consider a cer- 
tain probability space on graphs and go on to show that the probability for 
x(G) < k is smaller than 4, and similarly the probability for 7(G) < k 
is smaller than 5. Consequently, there must exist a graph with the desired 
properties. 


@ Proof. Let V = {v1, v2,..., Un} be the vertex set, and p a fixed num- 
ber between 0 and 1, to be carefully chosen later. Our probability space 
G(n, p) consists of all graphs on V where the individual edges appear with 
probability p, independently of each other. In other words, we are talking 
about a Bernoulli experiment where we throw in each edge with proba- 
bility p. As an example, the probability Prob(K’,,) for the complete graph 
is Prob(K,) = pl2), In general, we have Prob(H) = p™(1 — p)(a)-™ if 
the graph H on V has precisely m edges. 
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Let us first look at the chromatic number y(G). By a = a(G) we denote 
the independence number, that is, the size of a largest independent set in G. 
Since in a coloring with x = x(G) colors all color classes are independent 
(and hence of size < a), we infer ya > n. Therefore if a is small as 
compared to n, then y must be large, which is what we want. 

Suppose 2 < r < n. The probability that a fixed r-set in V is independent 


is (1 — p)), and we conclude by the same argument as in Theorem 2 


Prob(a >r) < (") (1 — p)&) 


r 


r 


n’(1—p)@) = (n(1—p)F)" < (ne PO-Y/2)" 


IA 


since 1 —p < e ? for all p. 
k 


Given any fixed k > 0 we now choose p := n +1, and proceed to show 
that for n large enough, 


\ 


Prob(a > =) (3) 


Indeed, since net grows faster than logn, we have nF > b6klogn 
for large enough n, and thus p > 6k EZ, For r = |3,| this gives 
pr = 3logn, and thus 


—p(r— — Pr a, L 
ne p(r—1)/2 = ne 262 < ne 2 ez =n 2e2 = (£)2, 


which converges to 0 as n goes to infinity. Hence (3) holds for all n > nj. 
Now we look at the second parameter, y(G). For the given k we want to 
show that there are not too many cycles of length < k. Let 7 be between 3 
and k, and A C V a fixed i-set. The number of possible i-cycles on A is 
clearly the number of cyclic permutations of A divided by 2 (since we may 


traverse the cycle in either direction), and thus equal to wy The total 


number of possible 7-cycles is therefore () ey and every such cycle C’ 


appears with probability p’. Let X be the random variable which counts the 
number of cycles of length < k. In order to estimate X we use two simple 
but beautiful tools. The first is linearity of expectation, and the second is 
Markov’s inequality for nonnegative random variables, which says 


EX 
Prob(X >a) < — 
a 


where FX is the expected value of X. See the appendix to Chapter 17 for 
both tools. 

Let Xo be the indicator random variable of the cycle C of, say, length 7. 
That is, we set Xc¢ = 1 or 0 depending on whether C appears in the graph 
or not; hence EXc¢ = p'. Since X counts the number of all cycles of 
length < k we have X = )> Xc, and hence by linearity 


“ nm\ GIT)! s I i; 1 kk 
EX = = — a 5 LMP S x(k —2)n%p", 


316 


Probability makes counting (sometimes) easy 


where the last inequality holds because of np = nF > 1. Applying now 


Markov’s inequality with a = 5, we obtain 


ae (k — 2) kr) = (k—2)n-FF, 


Prob(X > 2) < —— 
ob 23) 5 an n 


Since the right-hand side goes to 0 with n going to infinity, we infer that 
p(X > 8) < Ff forn > no. 


Now we are almost home. Our analysis tells us that for n > max(n1, n2) 
there exists a graph H on n vertices with a(H) < 5; and fewer than 5 


cycles of length < k. Delete one vertex from each of these cycles, and 
let G be the resulting graph. Then 7(G) > & holds at any rate. Since G 
contains more than vertices and satisfies a(G') < a(H) < 3, we find 


n/2 n n 
xlG) 2 7a = da) > aja > ® 


and the proof is finished. 


Explicit constructions of graphs with high girth and chromatic number (of 
huge size) are known. (In contrast, one does not know how to construct 
red/blue colorings with no large monochromatic cliques, whose existence 
is given by Theorem 2.) What remains striking about the Erdés proof is 
that it proves the existence of relatively small graphs with high chromatic 
number and girth. 


To end our excursion into the probabilistic world let us discuss an important 
result in geometric graph theory (which again goes back to Paul Erdés) — 
with a stunning Book Proof. 

Consider a simple graph G = G(V, EF) with n vertices and m edges. We 
want to embed G into the plane just as we did for planar graphs. Now, 
we know from Chapter 13 — as a consequence of Euler’s formula — that 
a simple planar graph G with n > 3 vertices has at most 3n — 6 edges. 
Hence if m is greater than 3n — 6, there must be crossings of edges. The 
crossing number cr(G) is then naturally defined: It is the smallest number 
of crossings among all drawings of G, where crossings of more than two 
edges in one point are not allowed. Thus cr(G) = 0 if and only if G is 
planar. 


In such a minimal drawing the following three situations are ruled out: 


e No edge can cross itself. 

e Edges with a common endvertex cannot cross. 

e No two edges cross twice. 
This is because in either of these cases, we can construct a different drawing 
of the same graph with fewer crossings, using the operations that are indi- 


cated in our figure. So, from now on we assume that any drawing observes 
these rules. 
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Suppose that G is drawn in the plane with cr(G) crossings. We can im- 
mediately derive a lower bound on the number of crossings. Consider the 
following graph H: The vertices of H are those of G together with all 
crossing points, and the edges are all pieces of the original edges as we go 
along from crossing point to crossing point. 

The new graph H is now plane and simple (this follows from our three 
assumptions!). The number of vertices in H is n + cr(G) and the number 
of edges is m + 2cr(G), since every new vertex has degree 4. Invoking the 
bound on the number of edges for plane graphs we thus find 


m+2ecr(G) < 3(n+cr(G)) —6, 


that is, 
cr(G) > m—3n+6. (4) 


As an example, for the complete graph AK’g we compute 
cr(Kg) > 15-184+6=3 


and, in fact, there is an drawing with just 3 crossings. 


The bound (4) is good enough when m is linear in n, but when m™ is larger 
compared to n, then the picture changes, and this is our theorem. 


Theorem 4. Let G be a simple graph with n vertices and m edges, where 
m > 4n. Then 3 
1m 
cr(G) > 


= 64 n2° 


The history of this result, called the crossing lemma, is quite interesting. 
It was conjectured by Erd6és and Guy in 1973 (with ay replaced by some 
constant c). The first proofs were given by Leighton in 1982 (with - in- 
stead of aa) and independently by Ajtai, Chvatal, Newborn and Szemerédi. 
The crossing lemma was hardly known (in fact, many people thought of it 
as a conjecture long after the original proofs), until Laszl6 Székely demon- 
strated its usefulness in a beautiful paper, applying it to a variety of hitherto 
hard geometric extremal problems. The proof which we now present arose 
from e-mail conversations between Bernard Chazelle, Micha Sharir and 


Emo Welz], and it belongs without doubt in The Book. 


Hf Proof. Consider a minimal drawing of G, and let p be a number between 
0 and 1 (to be chosen later). Now we generate a subgraph of G, by selecting 
the vertices of G to lie in the subgraph with probability p, independently 
from each other. The induced subgraph that we obtain that way will be 
called Gp. 


Let np, Mp, Xp be the random variables counting the number of vertices, 
of edges, and of crossings in G,. Since cr(G) —m+3n > 0 holds by (4) 
for any graph, we certainly have 


E(Xp—mMpt+3n,) > 0. 
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Now we proceed to compute the individual expectations E(n,), E(m,) and 
E(X,). Clearly, E(n,) = pn and E(m») = p*m, since an edge appears 
in G, if and only if both its endvertices do. And finally, E(X,) = p*cr(G), 
since a crossing is present in G', if and only if all four (distinct!) vertices 
involved are there. 


By linearity of expectation we thus find 


0 < E(X,) — E(mp) + 3E(n,) = p*er(G) — p?m + 3pn, 
which is ; 

we): S p nS 3pn - a an (5) 
Here comes the punch line: Set p := ne (which is at most 1 by our assump- 
tion), then pm = 4n, and (5) becomes 


and this is it. 


Paul Erdés would have loved to see this proof. 
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About the Illustrations 


We are happy to have the possibility and privilege to illustrate this volume 
with wonderful original drawings by Karl Heinrich Hofmann (Darmstadt). 
Thank you! 

The regular polyhedra on p. 90 and the fold-out map of a flexible sphere 
on p. 98 are by WAF Ruppert. Jiirgen Richter-Gebert provided illustrations 
for p. 92; Ronald Wotzlaw wrote the graphics for p. 158. Jan Schneider, 
Marie-Sophie Litz and Miriam Schléter created the images for Chap. 15. 
Page 281 features the Weisman Art Museum in Minneapolis designed by 
Frank Gehry. The photo of its west facade is by Chris Faust. The floorplan 
is of the Dolly Fiterman Riverview Gallery behind the west facade. 

The portraits of Bertrand, Cantor, Erdés, Euler, Fermat, Herglotz, Hilbert, 
Littlewood, Pélya, Schur, Sylvester, and Van der Waerden are all from 
the photo archives of the Mathematisches Forschungsinstitut Oberwolfach, 
with permission. (Many thanks to Annette Disch and Ivonne Vetter!) 

The Gauss portrait is a lithograph by Siegfried Detlev Bendixen published 
in Astronomische Nachrichten 1828, as provided by Wikipedia. The picture 
of Hermite is from the first volume of his collected works. 

The Eisenstein portrait is reproduced with friendly permission by Prof. 
Karin Reich from a collection of portrait cards owned by the Mathema- 
tische Gesellschaft Hamburg. 

The portrait stamps of Buffon, Chebyshev, Euler, and Ramanujan are from 
Jeff Miller’s mathematical stamps website http://jeff560.tripod.com with 
his generous permission. 

The photo of Claude Shannon was provided by the MIT Museum and is 
here reproduced with their permission. 

The portrait of Cayley is taken from the “Photoalbum fiir Weierstrab” 
(edited by Reinhard Bolling, Vieweg 1994), with permission from the Kunst- 
bibliothek, Staatliche Museen zu Berlin, Preussischer Kulturbesitz. 

The Cauchy portrait is reproduced with permission from the Collections 
de I’Ecole Polytechnique, Paris. The picture of Fermat is reproduced from 
Stefan Hildebrandt and Anthony Tromba: The Parsimonious Universe. 
Shape and Form in the Natural World, Springer-Verlag, New York 1996. 
The portrait of Ernst Witt is from volume 426 (1992) of the Journal fiir die 
Reine und Angewandte Mathematik, with permission by Walter de Gruyter 
Publishers. It was taken around 1941. 

The photo of Karol Borsuk was taken in 1967 by Isaac Namioka, and is 
reproduced with his kind permission. 

We thank Dr. Peter Sperner (Braunschweig) for the portrait of his father, 
and Vera Sés for the photo of Paul Turan. 

Thanks to Noga Alon for the portrait of A. Nilli! 
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capacity of a polynomial, 171 coupon collector’s problem, 220 
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edge of a graph, 80 
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finite set system, 213 
forest, 81 

formal power series, 241 
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Gale’s theorem, 303 
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girth, 314 

golden section, 295 

graph, 80 

graph coloring, 277 
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Heine—Borel theorem, 40 
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Hilbert’s third problem, 67 
homogeneous, 170 
hyper-binary representation, 130 


incidence matrix, 79, 198 
incident, 80 
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independent set, 81, 271 
induced subgraph, 81, 272 
inequalities, 143 

infinite products, 241 
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intersecting family, 214, 301 
involution, 22 

irrational numbers, 47 
isomorphic graphs, 81 


Jacobi determinants, 57 


Kakeya conjecture, 248 
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Kneser graph, 301 
Kneser’s conjecture, 302 
knot, 105 

knot theory, 99 

knots and links, 99 


labeled tree, 235 

Lagrange’s theorem, 4 

Latin rectangle, 254 

Latin square, 253, 271 

Latin squares, 265 

lattice, 93 

lattice basis, 94 

lattice paths, 229 

lattice points, 30 

law of quadratic reciprocity, 28 
Legendre symbol, 27 

Legendre’s theorem, 10 
lexicographically smallest solution, 70 
line graph, 276 

linear extension, 210 

linearity of expectation, 116, 190 
link, 105 

linked circles, 100 

list chromatic number, 272 

list coloring, 272, 278 
Littlewood—Offord problem, 179 
log-convex function, 173 

loop, 80 

Lovasz umbrella, 294 

Lovasz’ theorem, 297 
Lyusternik—Shnirel’ man theorem, 302 


Markov’s inequality, 116 
marriage theorem, 216 
matching, 274 

matrix of rank 1, 119 
matrix-tree theorem, 237 
mean square average, 44 
Mersenne number, 4 

Minc’s conjecture, 261 
Minkowski symmetrization, 113 
mirror image, 75 

monomial, 248 

monotone subsequences, 196 
Monsky’s Theorem, 157 
multiple edges, 80 

museum guards, 281 


Mycielski graph, 314 


near-triangulated plane graph, 278 
nearly-orthogonal vectors, 118 
needles, 189 

neighbors, 80 

Newman’s function, 131 
non-Archimedean real valuation, 156 
non-Archimedean valuation, 160 


obtuse angle, 111 

odd function, 184 

order of a group element, 4 
ordered abelian group, 160 
ordered set, 139 

ordinal number, 139 
orthonormal representation, 294 
orthogonal matrix, 39 
outdegree, 273 


p-adic value, 156 

partial Latin square, 253 
partition, 241 

partition identities, 241 
path, 81 

path matrix, 229 

pearl lemma, 69 

Pell’s equation, 15 
pentagonal numbers, 243 
perfect matching, 261 
periodic function, 184 
permanent, 169, 261 
Petersen graph, 301 

Pick’s theorem, 93 
pigeon-hole principle, 195 
planar graph, 89 

plane graph, 89, 278 

point configuration, 83 
polygon, 73 

polyhedron, 67, 73 
polynomial with real roots, 146, 166 
polytope, 111 

prime field, 20 

prime number, 3, 9 

prime number theorem, 12 
probabilistic method, 311 
probability distribution, 286 
probability space, 116 
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projective plane, 201 


quadratic nonresidue, 27 
quadratic reciprocity, 28 
quadratic residue, 27 


rainbow triangle, 157 
Ramsey number, 312 
random variable, 116, 262 
rate of transmission, 291 
red-blue segment, 159 
Reidemeister moves, 99, 105 
Riemann zeta function, 62 
riffle shuffles, 225 
Rogers—Ramanujan identities, 245 
rooted forest, 239 

roots of unity, 37 


scalar product, 118 
Schonhardt’s polyhedron, 282 
segment, 68 

Shannon capacity, 292 
shuffling cards, 219 

simple graph, 80 

simplex, 74 

size of a set, 127 

slope problem, 83 

spectral theorem, 39 

speed of convergence, 59 
Sperner’s lemma, 203 
Sperner’s theorem, 213 
spherical dome, 101 
squares, 20 

stable matching, 274 

star, 79 

Stern’s diatomic series, 128 
Stirling’s formula, 13 
stopping rules, 222 


subgraph, 81 

sums of two squares, 19 

support of a random variable, 262 
Sylvester’s theorem, 15 
Sylvester—Gallai theorem, 77, 92 


system of distinct representatives, 215 


tangential rectangle, 146 
tangential triangle, 146 
top-in-at-random shuffles, 221 
touching simplices, 107 
tree, 81 

triangle-free graph, 314 
trivial knot, 105 

trivial link, 105 

Turan graph, 285 

two square theorem, 19 
2-colorable set system, 311 


umbrella, 294 
unimodal, 14 
unit d-cube, 74 


valuation ring, 160 

valuations, 155, 160 

Van der Waerden’s conjecture, 169 
vertex, 74, 80 

vertex degree, 90, 199, 272 
vertex-disjoint path system, 229 
volume, 98 


weighted directed graph, 229 
well-ordering theorem, 139 
windmill graph, 307 

winged shape, 23 

winged square, 23 


zero-error capacity, 292 
Zorn’s lemma, 161 


